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. . . we always have had a great 
deal of difficulty in understanding 
the world view that quantum 
mechanics represents. At least I 
do, because I'm an old enough 
man that I haven't got to the point 
that this stuff is obvious to me. 
Okay I still get nervous with it. 
And therefore, some of the 
younger students. . . you know 
how it always is, every new idea, 
it takes a generation or two until it 
becomes obvious that there's no 
real problem. It has not yet 
become obvious to me that there's 
no real problem. I cannot define 
the real problem, therefore I 
suspect there's no real problem, 
but I'm not sure there's no real 
problem. 

Richard Feynman 
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Abstract 



In this thesis we explore the question: "what's strange about quantum mech- 
anics?" 

This exploration is divided in two parts: in the first, we prove that there 
is in fact something strange about quantum mechanics, by showing that it is 
not possible to conciliate quantum theory with various different definitions of 
what should be a "normal" theory, that is, a theory that respects our classical 
intuition. In the second part, our objective is to describe precisely which parts 
of quantum mechanics are "non-classical". For that, we define a "classical" 
theory as a noncontextual ontological theory, and the "non-classical" parts of 
quantum mechanics as being the probability distributions that a ontological 
noncontextual theory cannot reproduce. Exploring this formalism, we find a 
new family of inequalities that characterize "non-classicality". 
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Resumo 



Nessa dissertacao exploramos a questao: "o que ha de estranho em mecanica 
quantica?" 

Essa exploracao se divide em duas partes: na primeira, provamos que 
de fato ha algo estranho em mecanica quantica, mostrando que nao e pos- 
sivel conciliar o formalismo quantico com varias definicoes diferentes do que 
seria uma teoria "normal", isto e, que respeite nossa intuicao classica sobre 
o mundo. Na segunda parte, nosso objetivo e descrever precisamente quais 
partes da mecanica quantica sao "nao-classicas". Para isso, definimos uma 
teoria "classica" como uma teoria ontologica nao-contextual, e as partes "nao- 
classicas" da mecanica quantica como sendo as distribuicoes de probabilidade 
que uma teoria ontologica nao-contextual nao consegue reproduzir. Explor- 
ando esse formalismo, encontramos uma nova famflia de desigualdades que 
caracterizam essa "nao-classicalidade". 
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Introduction 



Quantum mechanics is magic 

Daniel Greenberger 

This thesis is meant to explore the question posed by Chris Fuchs: what is 
"Zing!" [ ]? What is the property of quantum mechanics which is essentially 
quantum, absent from any classical theory? Contrary to the goals of Chris 
Fuchs, our exploration is operationalist rather than axiomatic: our "Zing!" 
is not a deep axiom that reveals the essence of quantum theory, but rather 
logically connected sets of probability distributions that cannot be reproduced 
by any classical theory. Although finding his axiom would be nice, we feel 
that our approach is more useful, as these sets of probability distributions are 
the resources needed for quantum magic: quantum computing and quantum 
key distribution. 

This is emphatically not a historical account of the subject: these are 
plentiful, and another one is unnecessary. Therefore, we shall try to keep 
references to the great works of von Neumann, Bell, Kochen, and Specker to 
a bare minimum, while emphasising the newer 1 works of Abramsky, Busch, 
Cabello, Hardy, Pitowsky, and Spekkens. The sole exception shall be the work 
of George Boole, that although very old is still very unknown. 

Given a general picture of my motivations and goals, let me now give a 
more detailed account of the structure of this thesis. 

Chapter 1 presents introductory material 2 on the question "is quantum 
mechanics really different from 'classical' theories?". It begins by capturing 
some notions of classicality within the framework of ontological theories; then 
this question is made more precise as "is there an ontological embedding of 
quantum theory?". 

The chapter proceeds by detailing specific ontological models, and showing 
which problems arise in trying to reproduce the results of quantum mechanics 
within them. These problems are then understood as their failure to respect 
noncontextuality, a notion that we argue to be fundamental in defining clas- 
sicality. After giving a precise definition of noncontextuality, we proceed to 
prove Spekkens' theorem of the impossibility of embedding quantum theory 
within a preparation noncontextual ontological model. 

T As a result, the median year of publishing of our references is 2002. 

2 The reader that is already well-acquainted with the subject (or a mathematician) may find it 
better to skip it. 
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We proceed then to revisit our assumptions, and try to find whether a 
less ambitious notion of classicality can embed quantum theory. To do that, 
we revisit the historical theorems of von Neumann and Gleason, culminating 
with the recent version of Busch. In each of their frameworks, a "classical" 
formulation of quantum mechanics is again ruled out. 

The next stop is the famous theorem of Kochen and Specker, that uses the 
weakest assumptions yet. We present three recent versions of it, by Cabello el 
ah, Yu and Oh, and Peres and Mermin, that are considerable simplifications 
of the original proof. 

The chapter concludes by presenting a recent theorem of Hardy, that "any 
ontological embedding of quantum theory is very uncomfortable", and two 
specific contextual ontological embeddings of quantum theory. 

Our conclusion is then that any reasonable ontological embedding of 
quantum theory is impossible; therefore there is something more in quantum 
mechanics that classical theories cannot quite capture. Chapter 2 is then 
dedicated to detail what this something is. 

We begin by constructing our final definition of noncontextuality. Based 
on the recent work of Abramsky and Brandenburger, we show that the Fine 
theorem admits a natural generalization that applies to any set of observables, 
without regard to spatial separation. This generalization in its turn motivates a 
definition of noncontextuality that is a natural generalization of the definition 
of locality, with mostly the same mathematical structure - this allows us to 
consider generalizations of Bell inequalities that test noncontextuality instead 
of locality. Interestingly, this "new" definition was already implicit in the 
ancient works of Boole (and in the more recent works by Pitowsky), which 
motivates us to call these generalized Bell inequalities Boole inequalities. 

This "new" approach is then formalized via a classical problem in mathem- 
atics, the marginal problem. Using its formalism, we gain access to powerful 
tools to separate contextual from noncontextual probability distributions, and 
with them derive a new result: a set of Boole inequalities that completely 
describes an infinite family of noncontextual polytopes. 



Notation and definitions 



The purpose of this part of the thesis is only to establish notation, not to 
teach quantum mechanics to anyone. If one needs such an introduction, we 
recommend the excellent book of Michael Nielsen and Isaac Chuang [ ] . 

We say that an operator A is self-adjoint, i.e., A — A*, if ((p\A\p) = 
(A<p\ip) — ((p\A\ip) for all \(p), \ip). We shall only deal with finite-dimensional 
operators. The set of all self-ajoint operators is 0(T-L). 

A quantum-mechanical observable is a self-adjoint operator. 

We say that an operator A is positive, i.e., A > 0, if (ip\A\ip) > for all 

A quantum state p is a positive operator such that < tr p < 1 [3]. Since we 
shall have no use for states such that tr p < 1, we can omit the normalization of 
our quantum states without ambiguity. The set of all quantum states is T>(T-L). 
A pure quantum state is an extremal point of T>(T-L), a rank-one projector ip. 
The vector of a pure quantum state will be denoted by \ip), and the vectors 
are connected to the projectors by 

The set of all pure states is VH. 

An effect E is a positive operator smaller than identity, i.e., < E < 1. The 
set of all effects is £ [H). A set of effects {E,} such that Yj = 1 describes a 
measurement 3 and is called a POVM. 

A projector IT is a self-adjoint operator such that IT 2 = IT. The set of all 
projectors is ViJ-i). A set of projectors {IT,} such that £,11; = 1 describes 
a measurement and is called a PVM. Note that a PVM is a special case of a 
POVM. 

The Born rule is the quantum mechanical rule for associating measurement 
probabilities with states and effects. We say that 

p(i\p,E) = tr pE h 



3 Except for the post-measurement state. 
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Chapter 1 



Ontological embeddings of 
quantum theory 

Classical measurements reveal information. 
Quantum measurements produce information. 

Marcelo Terra Cunha 

The quest for embedding quantum mechanics in a "classical" theory is 
almost as old as quantum theory itself. People were disturbed with the role 
of measurement in the theory particularly with its intrinsic randomness and 
non-repeatability. So they tried to explain away these features as emergent, 
rather than fundamental, as if they appeared because of a lack of control and 
understanding of a more refined theory, that would describe the "deeper" 
physics behind quantum phenomena. We call this refined theory an ontological 
theory. 

But despite being familiar, the words "classical" and "ontological" have 
very fuzzy meanings. In the next section we shall pin them down and clarify 
them. 

1.1 What is an ontological theory? 

The first ontological models that appeared tried to "solve" the problem of 
non-determinism. They postulated that ip was not the real state of nature, but 
rather some kind of shadow of it. So they postulated that there was a real 
state, an ontic state 1 , called A, that if known would render all measurement 
outcomes deterministic. That is, given a PVM 2 M = {M/ c }, the probability of 
outcome k given A would be either or 1, that is, we can define a response 
function 

ft| M :A->{0,l} # 

lr The reader that is well-acquainted with the subject might be wondering when the expression 
"hidden-variable" will appear. Well, it won't. 

2 Even the most determined determinist can't hope for a POVM to be deterministic. We'll 
explain why in a while. 
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such that fwjvr(A) is the probability of outcome k. Here, A is any space in 
which our ontic states A are defined, and to account for the fact that M/ c = 1, 
we require that Xjt £jfc|M(^-) = 1 ror a ^ ^- This is just the requirement that 
some outcome must occur in a measurement. 

Then the subjective indeterminism of quantum theory would be recovered 
by the ignorance of which ontic states were really present in a experiment. 
That is, a quantum state tp would determine a probability distribution )iip{X) 
over A. This property can be thought of as "you were trying to generate state 
ip, but you ended up generating an ensemble of ontic states ^(A)". As in 
quantum (and classical) mechanics, we shall call the ensemble py (A) itself a 
state, while reserving the term pure ontic state for the individual A, which 
can of course be represented as an ensemble with a 6 distribution. 

Of course, we want this subjective indeterminism to agree with the predic- 
tions of quantum mechanics, so 



1.1.1 On mixed states and POVMs 

The early literature of ontological theories did not do this separation between 
states and measurements 3 [4, ]; instead they tried to define a deterministic 
value function v(M] c , ip, A) that would answer with certainty the outcome of 
an experiment, given the quantum state and the ontic state, and recover the 
quantum statistics by averaging over A. This is quite problematic, since it can 
only describe models in which ip itself has an ontic status 4 ; it therefore can 
never describe experiments where the quantum state is explicitly epistemic, 
e.g., a mixed state. For instance, let's say we have two pure states ip and 
(p with different deterministic outcomes v(M k ,ip, A) and v(Mi, <p,X). Then 
if I prepare state ip with probability p or state <p with probability (1 — p), 
corresponding to the mixed state p = pip + (1 — p)cp, the outcome must be 



which is neither nor 1 for non-trivial p, a contradiction. 

Using probability distributions like we do, this can be accommodated in a 
very natural manner: 

Lemma 1. If one prepares the quantum states xpj with probabilities p\, then the 
corresponding ontic state is 



Proof. Quantum mechanics tells us that p(k\(pi,ipi),M) = J^iPip{k\ipj,M). 
Writing these probabilities ontologically, we have 5 



3 With the honourable exception of the Kochen-Specker model, discussed in section 1.3.1. 
4 See section 1.3 for further discussion of this point. 

5 When doing calculations we shall often omit the integration variable A, but only when 
there's no risk of ambiguity. 




(1.1) 



v(M k ,p,A) = pv{M k/ xp,A) + (1 - p)v{M k ,<p,A), 
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Since £fc|M is positive and arbitrary, this implies that 

F( Pl vVi)( A ) = EW«( A )- 

i 

□ 

Note that this same rule is used to describe convex combinations of states 
in quantum and classical mechanics. 

The issue with POVMs is similar: one can implement the POVM 

£ = {p|0>(0| /P |l>(l| / (l-p)|+)(+|,(l-p)|->(-|} 

simply by measuring the PVM M = {|0)(0|, |1)(1|} with probability p and 
the PVM N = {|+)( + |, |-}(-|} with probability 1 - p [6]; we must have then 
?0|e( a ) = P?o|m(^)' which is obviously not deterministic. We must accept, 
then, that for these kinds of "mixed" POVMs 6 the response functions must be 
modified to 

& iE :A->[0,l], 

that is, allowing the whole interval [0, 1] as image. 

For "pure" POVMs, this argument does not apply, and we can not decide 
a priori whether to demand them to be deterministic. In fact, it is fruitful to 
allow even PVMs to be objectively non-deterministic 7 , so we shall not exclude 
this possibility. 

The most general case is, therefore, 

p{k\p,E)= d\ji p (\)£u E (\) =tr J 0E jt/ (1.2) 

J A 

and this is what an ontological theory should strive to reproduce, only falling 
back to pure states and PVMs when unavoidable. 

1.2 Ontological models 

With the definitions given in the previous section, it is already possible to 
construct some examples of ontological theories, to examine their features in 
a more concrete manner. 

1.2.1 The naive ontology 

If we allow an ontological model to have objective non-determinism, what 
we gain in relation to quantum mechanics? Not much, actually. This onto- 
logical model is so similar to quantum mechanics that it can be confounded 
with a naive interpretation of it, that ascribes ontological status to the pure 
states. Nevertheless, it is quite useful to examine meticulously this ontological 

following [ ], we are calling "mixed" the POVMs that can he written as a convex combination 
of different POVMs, and "pure" those who can't. 

7 However discomforting that may seem for some people, it's certainly a milder discomfort 
than abandoning the notion of reality altogether as in quantum mechanics. See section 1.2.1. 
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model, to be aware of the problems that such a naive interpretation has. This 
particular model was first proposed by [ ], and further explored in [ ]. 

In this model, we are considering the pure states ip to be the ontic states A, 
so we identify the ontic state space A with VH, and define 

The response function is then 

£)c|e( a ) = trAE fc , 
and we recover the results of quantum mechanics by 

p(k\ip,E)= [ dA6{A-ip)trAE k = trxpE k . 

J A 

We can see, then, that mathematically this ontological model is quite trivial. 
One interesting thing to examine, though, is the representation of mixed states 
in this formalism. Following lemma 1, we see that 

P = EPflfe ^ M A ) = EP' J ( A " ft)/ 

i i 

which trivially reproduces the required quantum statistics. The problem 
with this approach, however, is that the ontic state /'p(A) depends on which 
convex decomposition of p we chose to use. This makes the the notation 
jip suspect, since it should actually be ^(p,,^,)/ an d blatantly violates the C*- 
algebraic definition of state [ ], that requires that states that gives rises to the 
same statistics to have the same mathematical representation. We call this 
(unwanted) feature preparation contextuality, which we shall define more 
carefully in section 1.4. 

Remember that it is common for beginners to be surprised by the fact 
that it is impossible to know which convex combination was actually used 
to construct a given density matrix. Regarding the pure states as ontological, 
this feeling becomes quite natural, since the mystery is why should the state 

P-(Pi,tpi) & ve * ne same statistics as the state F(q,,<f,) when £j Vitfi = Yd lityi- 

To solve this problem, one might be tempted to ignore common sense (and 
lemma 1) and ascribe ontological status to mixed states, identifying A with 
T>(H.) instead of VH; then the ontic states would be just 

}i p {A) = S(A-p), 

relieving us of the basis-dependence. But this is in fact a terrible idea, since 
one can always write a mixed state p as a convex combination of two different 
states eg and <T\, as 

p = pcr + (1 - p)<r x . 
If you want to regard every mixed state as ontological, you have, by lemma 1, 

5{A -p) = P S(A - a ) + (1 - p)S(A - a x ), 

a flat-out contradiction. 
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One can now begin to suspect that it is not possible to avoid preparation 
contextuality; this will be proved in section 1.5. For now, we see that even 
the most humble ontological model, that does not even provide determinism, 
already has some very undesirable features. It would be a question then if 
a deterministic ontological model is even possible; fortunately this question 
was answered a long time ago in the positive. We shall see how in the next 
subsection. 

1.2.2 Constructing a deterministic ontological model 

In 1964, Bell had an idea on how to make a deterministic ontological model 
[ ]: hide the quantum mechanical probability of an outcome in the measure 
of the set of ontic states associated to that outcome. I shall present here a 
modified version of his model that makes this point quite clear. 

This model can describe in a deterministic way the measurement of a 
one-qubit PVM n = {n , Fix}. The ontic space is A = VU x [0, 1], with ontic 
variable A = (A^,, A T ). Fhe ontic state of a given quantum state ip is 

F(/>( A i/» A x) = <K A i/> - #)/ 
and the response functions 8 are 



One then recovers quantum statistics by uniform averaging over the ontic 
space: 



Fhe reader might have noticed that although the model claims to only work 
for a qubit, the mathematical formalism does not make any reference to this, 
and one might be tempted to think that it actually works for any two-outcome 
PVM. The fact that it does not work is more subtle, and we shall see why in 
section 1.7. 

8 Note that the response functions depend explicitly on the label of the projectors, so it 
would be desirable to set a consistent ordering convention to avoid giving different results to 



fo|n( A i/» A *) = ®(tr A^n - A_ T ) 
£i|nCV A «) = 1 ~ £o|n(-V A x), 
where is the Heaviside step function defined by 





{|0){0|,|l>(l|}and{|l>(l|,|0)(0|}. 
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1.3 i/7-ontic and i/?-epistemic models 

Both models presented in the previous section share a common feature: the 
quantum state has an ontological status. Either the ontic state is the quantum 
state itself, like in the naive model, or it is the quantum state supplemented by 
real number in the unit interval, as in the Bell model. In both cases, knowing 
the (pure) ontic state A of the system is enough to determine uniquely the 
(pure) quantum state that was prepared. These kind of models are called 9 
)/>-ontic, and have the equivalent but more operational definition: 

Definition 2. An ontological model is ip-ontic if for different quantum states <p and 
tp the ontic states have disjoint support, i.e., 

cp^xp ^ (A)^(A) = VA 

To motivate this definition it might be useful to make an analogy with 
classical mechanics: in it, an ontic state is a point in phase space, and ontic 
properties of it (like energy, momentum) are functions of the phase space 
point. Likewise, anything that is uniquely determined by the ontic state in an 
ontological theory should be regarded as ontic itself, as a change in it requires 
a change of the underlying ontic states. As the quantum state is uniquely 
determined by the ontic state in i/?-ontic models, it has to be regarded as ontic, 
as it is not possible to change it without changing the underlying ontic states. 

Apart from conceptual clarity, a reason to make this definition is that it is 
easy to see that !^-ontic models necessarily require instant transfer of informa- 
tion 10 . In the first case, where ip is the whole ontic state, it suffices to consider 
a measurement in an entangled state: Alice and Bob share \cp + ) = 1 00) + |11) 
and are spatially separated, Alice then measures the PVM {|0) (0|, |1) (1|} and 
obtains, e.g., the result 0. Bob's state then changes instantly from 1 to |0), 
violating causality. Of course, if ip is not the whole ontic state, there is no 
need for a violation of causality: A can tell us that the state of Bob's system 
actually was |0) all along, and so the ontic state does not change during the 
measurement. 

To deal with this case, we need the epr gedankenexperiment 11 [13]: consider 
that Alice can also measure the PVM { | + } ( + 1 / 1 — }( — | } ; then after her meas- 
urement Bob's state will belong to the set {|0), |1)} if she measures the first 
PVM, or to the set { | + ), | — ) } if Alice measures the second PVM. Even if the 
results of any given measurement can be predetermined by A, it cannot tell 
which measurement was made 12 . Since Bob's quantum state does depend on 
which measurement was made (since the four possibilities are different), the 
formalism needs again instant transfer of information. 

Another way to avoid the violation of causality is to say that ip is not ontic, 
but merely the representation of Alice's knowledge of reality, i.e., epistemic. 

9 The concept of ontic and epistemic states was first introduced in [ ], and further formalized 
in [8, 11]. A nice discussion of these concepts can be found in [12]. 

10 Only in the formalism, of course; if they displayed an observable violation of causality that 
would be a contradiction with quantum mechanics. 

"The version presented here is Einstein's version, reproduced in [8]. 

"Indeed, it could conceivably determine which measurement Alice will make - here we are 
using the assumption that she has free will. 
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Then what changed after the measurement was actually just what Alice knew 
about Bob's state, which is in fact a quite reasonable proposition. But this 
amounts to give up i/?-ontic models in favour of !/?-epistemic ones 13 : 

Definition 3. An ontological model is ip-epistemic if it is not ip-ontic. 

Again, an analogy with classical mechanics might be useful: the classical 
mixed state is a probability distribution over the phase space, and it is in- 
terpreted as epistemic, as it is merely an ignorance about which is the real 
phase space point that the system occupies. This is only possible as there is 
no restriction about the overlaps of different mixed states, i.e., the same phase 
space point can belong to numerous different mixed states. Notice that this 
definition is quite weak compared to the classical case: it only requires that 
there is one pair <p, ip whose ontic states jis and }ty share a single A in their 
support. 

The obvious question to ask: is there a ^-epistemic model? 
1.3.1 The Kochen-Specker model 

Even before this question was raised, it was already answered by Simon 
Kochen and Ernst Specker [ ], by the ontological model they constructed as 
a counterexample to von Neumann's theorem [15]. It seems that the authors 
were trying to make a model that was somewhat physically plausible, and 
ended up making a ^-epistemic model. We presented it here as rendered in 
[ ]■ 

The ontic space A is the unit sphere S 2 , and we shall use the Bloch vectors 
ip and cp to represent a pure state ip and a measurement projector <p in S 2 as 
well, defined via the isomorphism xp = i(l + ip ■ o~). The ontic state is then 

jty(A) = -&$-X)$-\, 

making the model clearly ^-epistemic, since the only states that do not overlap 
are orthogonal states. The response function is given by 

^(A)=0(f A). 

To recover the quantum statistics, notice that each of fly and £rf> has as support 
an hemisphere centred in i/> and cp, so their intersection defines a spherical 
lune. To take advantage of this, let's choose coordinates such that ip and ^ 
lie in the equator of S 2 , so that ip = (cos ip, sin ip, 0), $ = (cos <p , sin cp,0), and 



13 It is interesting to notice that although we've known this since 1935, the first ontological 
models were all i/>-ontic. 
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A = (sin 8 cos q>, sin 8 sin <p, cos 8) . We have then 
p((p\ip)= f dA —®(ip ■ A)$ ■ A@($ ■ A) 

JA 71 

= — J 2 dn0(sin0cos(<p — ip)) sin0cos(<p — i/?)0(sin#cos(<p — $)) 
If' 71 9 Z" 2 " 

— / d8 sin / d<p0(cos(<p — i/>)) cos(<p — i/>)0(cos(<p — (p)) 



71 JO JO 

= - / d<pcos(<p-i/>) 

= ^(l-sinfo-f -tt/2)) 

= l(l + cos(^-^)) 
= tr i/></>. 

This model does seem to be the most "natural" of the ontological models 
yet considered, and there have even been attempts to understand it physically 
[16]. In this same article, Terry Rudolph explores extensions of the Kochen- 
Specker model to higher dimensions, but fails to precisely reproduce quantum 
mechanics with them. Albeit it was ip-epistemic model for higher dimensions 
has since then been found (we discuss it in section 1.9.2), it does not have the 
simplicity of the Kochen-Specker model, and so it would be unfair to call it 
an extension of it. 

1.3.2 Two theorems on t/?-epistemic models 

We can see, then, that i/?-epistemic models are desirable and can actually be 
constructed. There are, however, two theorems that say that any such model, 
if it exists, has to be very unnatural. They are both based on the following 
idea: 

Lemma 4. If there are quantum states ipi and measurements E; such that tr !/?, £,- = 
Vi, then there can be no Ao in the support of all p^.. 

Proof. If these conditions are satisfied, then it must be true that 

/ dA^,.(A)£| E (A)=0, 

JA 

and therefore that f ( -| E (A) = for all A in the support of p,y v If there is a Aq in 
the support of all the ji^., making the model i/?-epistemic, then Yd £z|e(^o) = 0/ 
an absurd, since in the definition of the response functions we require that 
Ei !&| E (A) = 1 for all A. □ 

Of course, if we could prove that for any pair of states the hypothesis of 
the lemma are satisfied, we would have proven that no )/>-epistemic model 
is possible; but for a pair of states the hypothesis of the lemma are satisfied 
only if they are orthogonal, and by lemma 13 they must have disjoint support 
anyway: 
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Lemma 5. If there are quantum states ipo, ipi and measurements E,l — E such that 
tr ipQE — tr ip\ (1 — E) = 0, then ipo^i = 

Proof, tr ipi (1 — E) = => tri/^E = 1, so the support of ipi is contained 
in the support of £. But tr ipgE = implies that the supports of ipo and E are 
disjoint, and therefore the supports of ipo and ty\ are disjoint, so i/'oVl = ^ 

Instead, the two theorems we shall present consider larger families: the first 
considers families of three states to show that there are non-trivial examples, 
and the second argues that the existence of some specific families implies that 
any )/>-epistemic model must be very unnatural. 

Theorem 6 (Caves, Fuchs, Shack [17]). If the convex hull of a family of states xpi 
contains \/d, where d is the Hilbert space dimension, then there can be no Ao in the 
common support of all p.^. 

Proof. For any state tp{, it is true that tr ipi(l — i/>,-) = 0. If we can find coeffi- 
cients ttj such that — Tpi)} is a POVM, then lemma 4 applies and we're 
done. What we need is 

for a, > 0. Taking the trace on both sides we get that YU u i = jztt • Simple 
algebra then shows us that 

1 

□ 

This theorem was first proven in [ ], with a different objective. While it 
does not exclude ip-epistemic models, it shows there are a wide variety of 
families of states that can't have an overlap. If the number of states is three, 
there are already examples in any dimension where they are not orthogonal; 
see equations (1.6) for an example. 

The next theorem needs the following (very natural, in the author's opin- 
ion) assumption about the composition of different systems: 

Assumption 1. If two quantum states <p and ip are prepared independently, such 
that their joint state is <p®ip, then the corresponding ontic state for the joint system 
is ^^(A^Ab) = ^(A^)^(A B ). 

Theorem 7 (Pusey, Barret, Rudolph 14 [18]). Given assumption 1, no xp-epistemic 
ontological model of quantum mechanics is possible. 

Proof. Consider the four quantum states <£>o ® <po, <po ® <pi, <pi ® ^0/ an d (pi ® <p\. 
If there is a Ao in the support of p.^ and ]id> v then (Ao, Ao) is in the support of 
all four ^(A'^^A"). If there is a POVM {E !; } such that tr<ft <g) tyEjj = 0, 
then lemma 4 applies and we're done. 

14 We consider this article to be written in a very misleading way, so be warned to not take it 
seriously. For a sober version of it, see [12]. 
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Consider now the particular case \(po) = |0) and = |+). Then if E« is 
the projector onto |E,y) = \<pi(p^} + \<pj-<pj), it is easy to see that 

tr (pi (g> 0/Ey = ((pi(pj\Eij) = 0, 

and it is also easy (but tedious) to check that J^h E^ = 1. Unfortunately this 
simple strategy only works for this pair of states, and states with smaller 
overlap require measurements on a larger number of parts. For the proof of 
the general case, see the original article [18]. □ 

This theorem has two immediate corollaries: 

Corollary 8. Any ontological model of quantum mechanics must violate causality. 

One only has to notice that since the theorem excludes ip-epistemic models, 
we're left with i/^-ontic ones. And we have shown that those violate causality 
in the beginning of this section. 

Corollary 9. The ontic state space A is uncountable. 

In a i/?-ontic model there is an injection of V(T-L) onto A. Since V(H) is 
uncountable, A must be uncountable. In fact, even if without assumption 1 
we can still prove that A is infinite; we shall do this in section 1.8. 

The obvious question that this theorem raises is: can we do away with 
assumption 1 and prove once and for all that i/>-epistemic models are always 
impossible? The existence of the Kochen-Specker model already hints that at 
least some weaker assumption is needed, since it is a bona fide !/?-epistemic 
model. Of course, its existence does not contradict the theorem, since it only 
forbids models for dimension 4 or greater. In fact, soon after the Pusey-Barret- 
Rudolph was published, some of the same authors showed that without 
assumption 1 they could make a !^-epistemic model for a quantum system of 
any dimension. We shall describe this model in section 1.9.2. 

This theorem already hints of a theme that shall be recurrent in the search 
for ontological models: we can in fact make ontological models for quantum 
theory, and in fact we can make them almost in any way that we like, but 
there's a price to pay: the various aspects of the model become more and 
more intertwined. We can't really talk of independent quantum systems, 
separation between state and experiment, nor even (as we shall see in the 
next section) talk about a measurement outcome without talking about the 
whole experiment. Of course, this bodes very badly for the idea of ontological 
models: in the extreme limit of this interdependence our ontological model 
only lists possible experiments and their results, without ever trying to make 
sense of them in a simpler and more general theory. A model like this 
wouldn't be falsifiable by its very nature, but precisely because of this it is a 
perversion of the scientific method [ ], and should therefore be rejected on 
methodological grounds. 

What we seek, therefore, is not any ontological model, but one that might 
have some plausibleness. The ontological models present hitherto are of 
course very contrived, but by themselves they should not be taken as an 
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evidence against the possibility of a reasonable ontological model, since they 
were conceived only as proofs of principle, without any inspiration from 
physical grounds. 

1.4 Contextuality 

One should contrast the state of research into contextuality to the state of 
research into nonlocality. It is quite clear that nonlocality has a better status: it 
was subjected to experimental tests much earlier 15 , and also had its potential 
as a resource for practical applications recognized much earlier 1 

This state of affairs has many causes, which certainly includes the intuitive 
appeal of nonlocality via its relation with relativity, but I'd like to focus in a 
more formal one: the definitions of nonlocality and contextuality. Right in the 
first paper about nonlocality, John Bell [ ] already gave a clear operational 
definition of nonlocality, that was not dependent on quantum theory, but 
instead only on a general probabilistic framework. By contrast, the first defin- 
ition of contextuality, also due to John Bell 17 , was very specific to quantum 
theory, and was not at all operational: 

Definition 10 (Bell's contextuality). We say that an ontological model for quantum 
theory is noncontextual if the response function associated to the outcome k of a PVM 
M = {rij.}, i.e., ^\mW depends only on TIj- and not on the whole M. 

This definition also lacks conceptual clarity: John Bell even thought that it 
was reasonable for a physical theory to be contextual [ ]: 

The result of an observation may reasonably depend not only on 
the state of the system (including hidden variable) but also on the 
complete disposition of the apparatus. 

But one consequence of contextuality is precisely the violation of causality 
that he abhorred: consider, for instance, the PVM 

m = {n <8>i,ni<8>i,i®n 0/ i®ni}. 

If the real result £o|m(A)' associated with the projector ITo <8> % depends on 
whether the other side of the PVM is 1 ® n , 1 <8> TIi or 1 (g> fig, 1 ® E^, then 
the apparatuses must always be able to communicate their arrangement 
to each other, even when the choice of arrangement is made with a space- 
like separation, which is of course absurd. This settles the question about 
ontological models of independent quantum systems. But what about single 
systems? Is there any unacceptable consequence of contextuality for them? 

Yes! It also implies on a violation of causality. As put by Asher Peres and 
Amiran Ron [26]: 

15 i972 [20], in contrast with 2000 [21]. 
16 1991 [ ], versus 2000 [23]. 

17 The concept appeared first in 1966 [4], in a critique of the Gleason theorem, whereas the 
name "contextuality" was created in 1978 [2-], by Clauser and Shimony. 
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More generally, if [A, B] = [A, C] = but [B, C] ^ 0, suppose 
that we measure A first and only a later time decide whether to 
measure B or C or none of them. How can the outcome of the 
measurement A depend on this future decision? 

Furthermore, this whole story about communicating apparatuses is quite 
queer, even when it is not a violation of causality. After all, all the evidence we 
have is that the measurement of commuting observables does not affect each 
other, and an ontological theory that requires this kind of communication 
would be very weird indeed. Another problem is that this communication 
could affect only the individual measurements £/|jvi(A), and must never be 
detectable in the quantum experiments we do. To postulate this kind of 
"cryptocontextuality" 18 seems very unscientific: we would be making a theory 
which is about precisely what we can't measure. 

Another way to think about the weirdness of a contextual model is op- 
erationally: imagine that you are an experimentalist that has implemented 
an apparatus that can differentiate between the ground state and the excited 
states of a many-level atom. You try it hard, repeat your experiment a lot of 
times, with different input states, gather the statistics, and is confident that 
your apparatus is quite trustworthy; you now want to teach a friend experi- 
mentalist how to build a similar apparatus. Quite simple, isn't it? You just 
tell him how you did, ask him to gather statistics, and compare with yours: if 
the statistics match, you've implemented the same experiment. Except it isn't 
so if your physical theory is contextual: the statistics of the projector Flo (the 
projector onto the ground state) are not enough to determine the results of the 
experiment, since according to definition 10 the real results £o|n(A) depend 
on the rest of the (unmeasured) projectors; and these are not only the higher 
energy levels of the atom, but can in principle include any environmental data, 
such as the apparatus' mass, the local weather, whether Virgo is ascendant. . . 

In this way, we are rendered incapable of comparing experiments and 
establishing patterns, the very foundation of our scientific method. Notice 
the strong parallel between this discussion and the definitions of state and 
observable in the C*-algebraic axiomatization done by Franco Strocchi [ ]. 
This motivates a new definition of contextuality, due to Spekkens [ ], that 
takes into account these arguments: 

A noncontextual ontological model of an operational theory is one 
wherein if two experimental procedures are operationally equival- 
ent, then they have equivalent representations in the ontological 
model. 

Within this reasoning, it becomes sufficient to have equivalent statistics to 
be able to identify different experiments, and we are able again to do science. 
But a definition that uses only words is quite imprecise, and we should codify 
it in order to avoid misinterpretations: 



'With apologies to Asher Peres. 
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Definition 11 (Spekkens' contextuality). Let p(k\P,M) be the probability of 
obtaining the outcome k when doing the measurement M on a state prepared via 
procedure P. Then we say that an ontological model of an operational theory is 
measurement noncontextual if 

p(k\P,M) = p(k\P,M') VP => M = M'. (1.3) 

Analogously, we say that an ontological model of an operational theory is preparation 
noncontextual if 

p(k\P,M) = p(k\P',M) VM => P = P'. (1.4) 

The central idea is simple: if measurements M and M' give the same 
statistics for every preparation procedure P, then we must say that they are 
in fact the same measurement, with equivalent mathematical representation, 
and if preparation procedures P and P' give the same statistics for every 
measurement M, then we must say that they are in fact the same preparation 
procedure, with equivalent mathematical representation. 

Note that this definition improves on Bell's definition by removing any 
explicit reference to quantum theory, talking about only an "operational 
theory", i.e., a theory in which we can talk about preparation procedures, 
measurements, and probabilities. However, this is still not the definition we're 
looking for. We want to be able to say whether a given probability distribution 
is contextual or not, as we do with the definition of nonlocality. This we shall 
do in the next chapter; for this one, this definition is good enough. 

We want to specialize this definition to ontological models of quantum 
theory, as a matter of convenience, since that's all we'll be talking about. 
Note that in quantum theory p(k\P,M) = tr joM/ c is completely defined by 
the measurement operator and the quantum state p, so that's all our 
ontological model can take into account. More precisely 

Definition 12. We say that an ontological model of quantum theory is measurement 
noncontextual if 

&|m( A ) = £Mjt(A), 

that is, if the response function associated to the outcome k of a measurement M de- 
pends only on the measurement operator M^. Analogously, we say that an ontological 
model of quantum theory is preparation noncontextual if 

ftp (A) = u p (\), 

that is, if the ontic state associated to the preparation procedure P depends only on the 
quantum state p that is prepared. 

What else could the ontic state /'p(A) possibly depend on? Well, in the 
ontological models we discussed in sections 1.2.1 and 1.2.2 it depended on 
the "true" basis of p, making these states preparation contextual. It could also 
depend on the "true" purification of p, or really anything that one might deem 
plausible or implausible. What about measurements? Well, the most famous 
sort of context is that of Bell's definition of contextuality: the whole PVM M, 
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as do the ontological models discussed on section 1.9.2, but it could also be 
anything, such as the colour of the measurement apparatus, the latitude and 
longitude of the laboratory where the experiment is performed, etc. 

One final remark: if quantum theory were an ontological model of itself 
then definition 11 (and 12) would imply that it is not contextual, since it is 
trivial to prove that 

tr pM k = tr pM' k Vp => M k = M' k 

and 

tr pM k = tr <rM k VM k => p = a. 

Since it is not, the oft-heard claim that "quantum mechanics is contextual" is 
just meaningless. What one probably means with it is that any ontological 
model of quantum theory must be contextual, repeating a situation that 
happen in the area of nonlocality: quantum mechanics is obviously a local 
theory, in the relativistic sense, but any ontological model of quantum theory 
must be nonlocal, leading to the meaningless sentence "quantum mechanics 
is nonlocal". 

1.5 Contextuality for preparation procedures 

In this section we shall show that it is not possible to construct a preparation 
noncontextual ontological model of quantum theory [ ] . This is not the con- 
flict with quantum theory usually discussed, but we feel that it is appropriate 
to begin with it for three reasons: 

1. It is independent of assumptions on determinism 

2. It is simple 

3. It is novel 

To begin, we'll need to prove a simple lemma about how orthogonal 
states are represented in the ontic space A. We'll see that the possibility of 
distinguishing orthogonal states with certainty by a single-shot measurement 
implies that their representations in the ontic space must have disjoint support. 

Lemma 13. If two quantum states p and a are orthogonal then the corresponding 
ontic states pi p and }i a have disjoint support: 

pcr = => Hp(A)/v(A) = VA 

Proof. If p and a are orthogonal, then they can be distinguished with certainty 
in a single-shot measurement. To construct one such measurement, note that 
the supports of p and a must be orthogonal, and let IT^ be the projector onto 
the support of p. Then 



trpFLj = 1 and trcTIp = 0. 
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Writing these measurements ontologically we have 

Fp£n p = 1 and / ji^ n = 0, 



so £n p (A) = 1 for all A in the support of u p , and £n p (A) = for all A in the 
support of yiff, so the supports of u p and }t a are disjoint, and ^p(A)^ (r (A) = 
for all A. □ 

We will also need the assumption that is violated by all the ontological 
models discussed so far: 

Assumption 2 (Preparation noncontextuality). 

YLvtii = F( Pi *)( A ) = F(« 7 ,^)( A ) 

i i 

With the groundwork laid, we can now state the theorem and prove it. 

Theorem 14 (Spekkens [11]). ft is not possible to embed quantum theory into a 
preparation noncontextual ontological theory. 

Proof. Let (p, O, \, X, ip, and Y be quantum states such that 

= 4><S> = = ipY (i-5a) 

1 = tf> + <I> = ;r; + X = # + Y (1.5b) 

^l^^ + ^ + ^^^ + X + Y. (1.5c) 

That such a family of states exists can be proven by exhibiting an example in 
dimension 2, that can be easily embedded in higher dimensions: 



10 


= |0> 




1*} 


= |1> 




(1.6a) 


\x) 


= ^|o> + 




|x> 


= f|0> 




(1.6b) 


!</') 


=\ n - 






= f|0> 




(1.6c) 



A nice way to visualize the orthogonality and completeness relations (1.5) 
is to represent states (1.6) in the o~ x , o~ z plane of the Bloch sphere, as done in 
figure 1.1. 

Now we shall use lemmas 1 and 13 together with assumption 2 and 
relations (1.5) to derive a contradiction. Lemma 13 together with (1.5a) implies 
that 

^(A)^(A) = ^(A)^x(A) = /ty(A)/*y(A) = VA (1.7) 
Lemma 1, together with assumption 2 and relations (1.5b), implies that 

1 

= 2(Fx + Fx) (1.8b) 
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Figure 1.1: Representation of states (1.6) in the a x , a z plane of the Bloch sphere. 
The barycenter of antipodal states or states which are connected by a triangle 
is 1/2. 



and together with relations (1.5c) 

1 

3 (f<P + Fx + (i-9a) 
1 

gOo + Fx + W- (i-9t>) 

We shall conclude the proof by showing that the only simultaneous solu- 
tion to (1.8), (1.9), and (1.7) is the all-zero solution 

F<pW = /'4>( A ) = FxW = Fx(A) = fty(A) = ^y(A) = VA, 

which is absurd, since probability distributions can't be zero everywhere. 

The disjointness relations (1.7) imply that for each A at least one of ]i§ and 
]i<$ must be zero, and the same for the other letters. Therefore there are 8 
different cases to examine, although only two are essentially different. The 
first one is when pi^, pi x , and \iy are zero. Then (1.9) implies that \i<&, and 
piY must also be zero. The second case is when ^<j>, fi x , and \iy are zero. Then 
(1.8a) implies that ji\ t = and (1.9a) implies that }ii t = \}i,p. But the only 

solution to = ^jiq is \i<p = 0, and we can apply the previous argument to 
show that all probability distributions must be zero. The six remaining cases 
are simply relabellings of these two. 

As the above argument applies to every A, we have that all probability 
distributions are zero for every A, and thus are not probability distributions. 

□ 



Y\t = 
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1.6 Gleason theorems 

There are three theorems that I call "Gleason theorems": von Neumann's 
theorem [15], Gleason's theorem [ ] and Busch's theorem [ ]. Of these three, 
the most famous is certainly Gleason's 19 , and that is why I chose to name this 
section after it. All three theorems share a similar structure: they postulate 
some properties that a measurement \i should have, and then prove that the 
only measurement that satisfies those properties is the quantum mechanical 
one p(A) = tr pA. They can be interpreted in two ways: 

1. As an axiomatic improvement, by showing that the notion of quantum 
state and Bom's rule follow from weaker axioms. 

2. As excluding deterministic ontological theories, by saying that properties 
of u should be true in any theory, not only in quantum mechanics. Then 
one only has to notice that Born's rule is not deterministic. 

If one chooses the first interpretation, all three theorems are perfectly fine, and 
in fact quite similar. Problems arise, however, if one insists on interpreting 
them as excluding deterministic ontological theories. Then von Neumann's 
theorem becomes foolish 20 [ ], as its assumptions already excludes a large class 
of ontological theories, without good reason. 

1.6.1 von Neumann's theorem 

Theorem 15 (von Neumann [15]). Let A,B be self-adjoint operators, and yi : 
0{T-L) — > 1R a function such that 

1. }i(aA) = au{A) for real a. 

2. u(A + B) — u{A) + u{B) for commuting A, B. 

3. u{A + B) = p( A) + u(B) for non-commuting A, B. 

4. p(l) = 1 

5. }i{A) > for positive A. 

Then any such function can be written as 

u{U)=trpU, 
where p is a positive operator of unit trace. 

Proof. Properties 1, 2, and 3 establish that u is a linear functional on 0(H), 
and by the Riesz lemma can be represented as an inner product }i{A) = tr pA. 
Property 4 then implies that p has unity trace, as — trpl = trp = 1, 
and property 5 implies its positivity, since in particular projectors are positive 
operators, and u{\ty)(ip\) = trp\ip)(ip\ = (ip\p\ip) > for all tp is the definition 
of positivity. □ 

I9 The most infamous being von Neumann's. Busch's theorem is still new. 
20 The hasty reader might wonder why learn a foolish theorem. A quick answer would be to 
avoid repeating mistakes of the past [29, 30]. For a longer answer, read the section. 
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We can see, then, that the theorem itself is quite simple, and its value 
resides in the strength of its assumptions, which we shall examine now. The 
first thing one may notice is that the theorem already makes use of the Hilbert 
space formalism for the observables, and the fact that the states also follow 
the same formalism seems almost like a tautology. But this is not the case. 
Quantum mechanics can already implement this formalism in experiments in 
a quite successful manner, and one may regard observable A as just a proxy 
for the experiment that implements it; as y. can be any function a priori (we 
don't even assume it is continuous), there is not limitation in using 0(H) 
as its domain. We shall now proceed to examine the physical content of the 
assumptions. 

Assumption 1 and 2 can be interpreted as doing classical post-processing 
to the data of a single experiment, the measurement of a PVM {IT,}, that we 
define from the eigendecomposition of A. The multiplication of A by a constant 
is implemented just by multiplying its eigenvalues by the same constant. To 
implement the observable A + B corresponding to the sum of commuting 
operators A and B one notices that they can be diagonalized simultaneously 
as A = Yji a i^i an d B = Hd^i^i' an d so their sum A + B = Yji{ a i + bi)Tli 
is just a combination and rescaling of the data coming from the IT, outputs. 
Assumptions 4 and 5 can be justified by the possibility of interpreting }i(Tlj) 
as a probability: probabilities are positive, and some outcome must happen. 

The one which is harder to justify is assumption 3, since A, B, and A + B 
correspond to different experimental configurations: so the possibility of 
measuring A + B just by processing the data coming from the PVMs that 
measure A or B is excluded. Its justification comes from the fact that in 
quantum mechanics tr p(A + B) = tr pA + tr pB, and our ontological theory 
must reproduce its results. But this is where von Neumann slips, and to make 
the slip more clear, it's best to use the ontological notation, the correspondence 
being }i(A) = £4 (A). So assumption 3 translates to 



which is clearly overkill, since correspondence with quantum mechanics only 
requires that 



that is, that the expected values correspond, not the values of the response 
functions themselves. For instance, in the Bell-Mermin model, discussed in 
appendix A, we can see that the response function (A.i) is clearly linear with 
respect to the sum of commuting observables 21 



^ +B (A)=£ A (A) + £ B (A), 




A = «ol + a ■ a and B — b§h + b ■ a = b^l + aa ■ a, 
Note that A and B commute iff b — aa for some real a. 
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as 

£ A+B (tp,A.) = a + b + ||fl + «fl|| sign((« + oca) ■ (A + $)) 

= a + b + \l + «|||fl|| sign(l + ex.) sign(a ■ (A + $)) 

= fl + sign(« ■ (A + !/))) + & + a||a|| sign(a ■ (A + xp)) 

= £aOM) + £b(M), 

since the values that £^ assumes are the eigenvalues of A, and eigenvalues are 
linear with respect to the sum of commuting observables. Of course, this is 
not true when the observables do not commute, as we can see in the following 
example: 

£<r x+< r z (ip, A) = \/l sign(A. r + lp x + A z + ip z ) 

^ sign(A. T + ip x ) + sign(A z + ip z ) 
= ^.(^,A)+^ 2 (^,A). 

Therefore, we must conclude that this assumption is unfounded, and if no 
justification can be found to it, we must abandon von Neumann's prohibition 
of ontological models. We shall see, however, that even if we abandon this 
assumption, we can still prove a von Neumann-like theorem, valid in a more 
restricted context: that is Gleason's theorem. More surprisingly, however, is 
the fact that this assumption can be justified, by the consideration of POVMs. 
This realisation is what motivated the proof of Busch's theorem. 

1.6.2 Gleason's theorem 

Andrew Gleason was not concerned with von Neumann's theorem, not even 
with the problem of ontological models for quantum mechanics. His goal 
was to study the mathematical foundations of quantum mechanics, and to 
strengthen its axiomatic basis by showing that essentially every measure on 
a Hilbert space is given by Born's rule [ ]. Its significance to the exclusion 
of ontological models of quantum mechanics was first noticed by Bell [ ], 
who also remarked that contextual ontological models were not bound by 
Gleason's theorem. 

Theorem 16 (Gleason [ ]). Let M. be a separable Hilbert space over C with 
dim'H > 3, and p. : V{T-L) — > [0,1] a function such that YUV-^i) ~ 1 f or an V 
PVM {TI, }. Then any such function can be written as 

p{Ui)=trpU ir 

where p is a positive operator of unity trace. 

The proof of this theorem is already well-known, and a bit boring, so we 
shall omit it. The interested reader may find it in the original work [27], or in 
the clearer version by Bell [ ]. 

It is easy to see that von Neumann's p functions satisfy all the properties of 
Gleason's p functions, and continue to do so even if we drop his questionable 
assumption 3, so it is certainly possible to interpret Gleason's theorem as a 
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"reasonable" von Neumann theorem, with weaker assumptions. Also notice 
that Gleason's assumptions are explicitly non-contextual, by assuming that 
f((n,) is only a function of the projector Tlj, and not of the whole PVM. 

1.6.3 Busch's theorem 

Paul Busch was concerned with the justification of von Neumann's assumption 
3. He noticed that if one measures a POVM {E,} instead of a PVM, then 
it is possible to have in a single experiment two outcomes £0 an d ^1 that 
do not commute 22 , so it is perfectly natural to demand that }i(Eq + E\) = 
H(Eq) + y(E\), since one can measure Eq + E\ just by combining the outcomes 
corresponding to Eq and E\. He then restricted assumption 3 to sums of 
effects belonging to a single POVM, and was able to derive Born's rule from 
it, thus resurrecting von Neumann's theorem [ ]. Later he realized that the 
form of his theorem was actually closer to Gleason's than von Neumann's; to 
obtain it from Gleason's one only has to demand E; ^(n,) = 1 to be true for 
POVMs, instead of just form PVMs. Interpreted in this way, his theorem is a 
much stronger version of Gleason's with a much simpler proof [28]. 

The proof presented here mostly follows the one presented in [1], with the 
difference that it does not require the domain of y to be extended. 

Theorem 17 (Busch [ 8]). Let T-Lbe a separable Hilbert space over 23 Q[z] or C, and 
y : £ (H) ->■ [0, 1] a function such that E; p(E f ) = 1 for any POVM {£, }. Then 
any such function can be written as 

ji(Ei) = trpE it 

where p is a positive operator of unity trace. 

Proof. The proof begins by noticing that y is in fact a linear functional on £ {Ji ) . 
From that, the Riesz lemma establishes that it can represented as an inner 
product. Positivity and normalization of p then comes from the positivity and 
normalization of y. We shall first prove the case where T-L is over the complex 
rationals, and later extend the proof to the continuum. 

First note that if E is an effect, 1 — E is also an effect. Then considering 
the POVMs {E, 1 — E} and {E lr E 2 , . . . , E„,l - E}, where E; Ej = E, we see 
that y(E) — E/F^O- Considering the particular case E; = E/n, we get 
that y{E) = ny(E/n). On the other hand, if we consider E = mF and 
Ej = F, we get y(mF) = my(F). Combining these two cases, we see that 
y(~E) = my(\E) = » p(E), that is, y{qE) = qy{E) for q G Q+ whenever 
both qE and E are effects. Wrapping up, we have that 

i 

for rational qi whenever q^Ej are effects, so y already has some restricted 
linearity. If we can remove the restriction that q,Ej are effects, we get full 
linearity on £(%), and that's what we'll do now. 

22 In fact, this happens in all non-trivial POVMs.. 

23 Q[i] is the field extension of the rationals Q with the imaginary number i, Q[i] = 
{a + ib : a,b e Q}. 
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Consider the effects E and F < E. Then E = E — F + F, and }i(E) — 
}i(E — F) + ji(F), so }i{E - F) = }i{E) - ji(F). Consider now E,F,G 6 £(H) 
and p,q £ Q + such that E = pF — qG, but at least one of p and q is larger than 
unity, so pF and are not necessarily effects. Without loss of generality let 
p > q. Then ^E, F, and are all effects, and by the property we just proved, 

¥■ = A F ) - F(j,G)> so }i{E) = ppi(F) - qp.(G) and 

i 

for any rational q u so we have full linearity on £{%)■ Let then {£;}f =1 be a 
MIC-POVM and, as such, a basis for T-L . Then any effect £ can be written as 
E = Df_i *7;E; for q^ £ Q (a moment's thought will convince you that complex 
numbers aren't allowed). We can now define p by solving the d 2 equations 
trjo£; = }i(Ej), and see that 

d 2 d 2 / d 2 \ 

F{E) = J^qmiEi) = ^qitrpEi = tr [p^q^ = trpE. 

i=l i=l \ i=l / 

Positivity of p comes from considering the case where E is a one-dimensional 
projector: 

0<trpE = trp|?>M = fa|p|?>- 
The unity of the trace comes from 

1 = £>(£,) = Eh-pE, = trhoEE^ = trp. 

This completes the proof for Q[z]. To extend it to the continuum, note again 
that if £ > F, then ]i{E) = }i(E — F) + ]i{F), and so }i{E) > }i(F). Let then p; 
and q, be sequences of rational numbers tending to the real number a such that 
Pi < a. < q t . We have p,£ < ocE < q[E, and as such p ! ^(E) < pi(aE) < qi}i(E) r 
so ji{aE) = &}i(E). From this fact, one can now retrace the proof and see that 
it also holds for C. □ 

The reason that we decided to highlight the fact that Busch's theorem 
holds for Q[i] is that the original Gleason theorem fails for it, hinting that 
traditional contextuality might have problems dealing with subsets of C [^2, 
33]. This feature of Busch's theorem was first noticed in [ ]. 



1.6.4 Wrapping up 

Busch's theorem is clearly superior to von Neumann's in every way, but this is 
not true for Gleason's: they can be interpreted in different ways. Busch's shows 
that there can't be a non-contextual model capable of reproducing quantum 
mechanics in any dimension, while Gleason's opens up the possibility of 
such a model existing in dimension two, if we only care about projective 
measurements. That such a model exists can be seen by looking at the 
Bell-Mermin model in appendix A; but if, like Gleason, the reader is not 
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interested in the question of ontological theories, but in which measures 
are allowed given the Hilbert space structure of observables, the following 
counterexample 24 should suffice: 

1 _, „ . 

TtyO/O = 2^ +cos(ncos ($•$)))' odd n 

Note that for n = 1 this formula is simply Born's rule. 
It is easy to check that 

Ei i ( n i) = M ( P) + M 1 - ( P) 

i 

1 1 

= -(1 +cos(ncos~ 1 (</> ■ ;/>))) + -(1 + cos(ncos _1 (^ ■ + 7l)) 

1 1 

= -(1 + cos(ncos~ 1 ($ • $))) + -(1 — cos(ncos _1 (^ ■ $))) 

= 1, 

as required in Gleason's assumptions. 

To see that for n > 3 this formula can't equal Born's rule, notice that 

tri/>tf> = ^(l+Q-f) 

only has one root, if considered as a function of the angle cos -1 • 1/?), whereas 
our }iy((p) has n roots. 

1.7 The Kochen-Specker theorem 

A corollary of the Gleason theorem is that one can't embed quantum theory 
in a noncontextual ontological model if dim'H > 3, since the Born rule is 
explicitly noncontextual and non-deterministic; a direct proof of this fact 
might seem superfluous. But one might not like its assumptions: after 
all, it already assumes a fair bit of structure that is not quite needed and, 
more importantly, it needs to assume that the quantum valuation ^(Ilj) is 
defined for a continuous amount of projectors, which of course can never 
have experimental justification. This was the motivation 25 for Simon Kochen 
and Ernst Specker to develop a finite proof of noncontextuality finding an 
inconsistency in any deterministic assignment of values to a set of experiments 
realizable in quantum mechanics [14]. Another motivation to present it here 
is that it proves the claim in section 1.2.2 that noncontextual deterministic 
ontological models can not describe two-outcome PVMs. 

In modern parlance, the Kochen-Specker theorem is referred to as a proof 
of state-independent contextuality, as the logical contradiction found depends 
only on the structure of quantum observables, and not on the statistics from 
the measurement of specific states. This situation contrasts, of course, with 

24 Due to Marcelo Terra Cunha and Rafael Rabelo. 

25 The motivation can come from Gleason's theorem, or from a 1960 work of Specker [35, 36], 
that was independent of Gleason and also contained a "continuous" proof of contextuality. 
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proofs of state-dependent contextuality, which we shall explore mainly on the 
next chapter. 

More specifically, their proof says that we can't attribute deterministic 
values £n, (^) to a set of projectors {n,}^ in dimension three respecting 
the quantum mechanical observation that in the measurement of a PVM one 
answer (and only one answer) always occurs. An elegant way to proceed with 
the proof is to represent this set of projectors in an orthogonality graph (where 
each vertex corresponds to a projector, and two vertices are connected iff the 
corresponding projectors are orthogonal), and map the quantum mechanical 
observation into two rules for colouring the graph: 

1. Two connected vertices can't both have the value 1 - If two projectors 
n,- and TLj are orthogonal, they can be measured simultaneously, and 
therefore £n,(A) and £n ; (^) can't both equal 1. 

2. In a loop of three connected vertices, one of them must have the value 1 
- If three projectors are mutually orthogonal, they form a PVM, and in a 
PVM one answer (and only one answer) always occurs. 

The proof concludes by showing that no such colouring of the graph 
can exist, and therefore one can't attribute deterministic values to this set of 
projectors. We shall, however, omit it. Even though it is quite beautiful, the 
proof is mainly of historical interest, as simpler proofs have hitherto been 
found. We refer the interested reader to the original paper, or the excellent 
exposition of it by Cabello [37]. 

1.7.1 An 18-projector proof by Cabello, Estebaranz, and 
Garcia- Alcaine 

The simplest (with fewest projectors) such no-colouring proof that we currently 
know 26 was found in 1996 by Cabello, Estebaranz, and Garcia- Alcaine [40]. In 
contrast with Kochen-Specker's 117 projectors, it needs only 18 to generate 
a contradiction. These projectors are represented in figure 1.2, where v = 
(a, b, c, d) is just a shorthand notation for the projector onto \v) = a\0) + b\l) + 
c 1 2) + d\3). This figure does not represent an orthogonality graph, which 
would be quite cumbersome, but an orthogonality hypergraph, where sets of 
four commuting projectors are connected by edges of the same colour. 

One could in fact proceed to prove directly that it is non-colourable (there 
are few non-equivalent potential colourings), but it is more elegant to use a 
parity argument: we know that in each context we must have one answer 
1, so the sum over all answers in all contexts must be 9. But if we do this 
sum projector by projector, we see that each projector appears in exactly two 
contexts, and likewise each answer appears twice, so the sum over them must 
be an even number, a contradiction. 



26 We do know that in dimensions 3 and 4 there are no no-colouring proofs with 17 projectors 
or less [38, 39]. 
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v 2 =(l, 0,0,0) 



v 47 Ku,-i,ir^ 

v 48 =(l,0,l,0) 



v 37 =(l, 1,1,-1) 



v 2g =(0,0,0,l) 
v 29 =(0,l,l,0W^ 
v 2 =(0,1,-1,0) V/^ 



v 9 =( 1,0,0,1) 




r X c *v 59 =(i,-i,i,-i) 

v 5g =(l,0,-l,0) 



v g =(0, 1,0,0) 

v 7 =(0,0,l,l) 
\ sjv 6 =(0,0,1,-1) 



v 67 =(l,-l,0,0) 



v 69 =(l,l,-l,-l) 



v 4 =(0,1,0,-1) 



Figure 1.2: Vectors for the 18-projector proof of the Kochen-Specker theorem. 
Reproduced from [ ] with permission from the author. 

1.7.2 A 13-projector proof by Yu and Oh 

Shockingly, more recently it has been found that a non-colourable graph is 
not necessary to prove state-independent contextuality. Yu and Oh [ ] have 
found such a proof in dimension 3 based on a set of 13 projectors that does 
have a colouring that obeys rules 1 and 2. They argue that every possible 
colouring of their graph contradicts another prediction of quantum theory. The 
orthogonality graph is represented in figure 1.3, and its quantum realization 
is given by the vectors 



where r = (a,b,c) is just a shorthand notation for the projector onto \r) = 
a\0) +b\l) + c|2). It is important for the proof that this is actually the unique 
quantum realization of the orthogonality graph up to a global unitary trans- 
formation, which is trivial to prove. 

To obtain the contradiction with quantum mechanics, first note that no 
two hj can be assigned 1 simultaneously. We shall prove this by contradiction. 
By the symmetry of the graph, there are only two cases: 



21 = (1,0,0) 
z 2 = (0,1,0) 
z 3 = (0,0,1) 



ho 

h 

h 2 

h 



(1,1,1) 
(-1,1,1) 
(1,-1,1) 
(1,1,-1) 




(0,1,1) 
(0,1,-1) 

(1,0,1) 

(-1,0,1) 

(1,1,0) 

(1,-1,0) 
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Figure 1.3: Orthogonality graph for the proof of Yu and Oh. Reproduced from 
[43] with permission from the authors. 



1. Assume that (A) = (A) = 1. Then by the KS rules we must assign 
to 1/2 and yf, which oblige us to assign 1 to zi and Z3, a contradiction. 

2. Assume that (A) = £/, 2 (A) = 1. Then by the KS rules we must assign 
to yf and yf, which oblige us to assign 1 to Z\ and zi, a contradiction. 

This implies that YU *ah W — ^> an d furthermore that 

E /. n^i, ^ L 

But the Zfe must be equal to the quantum expectation value YU tr since 

i 

we get that Yd tr tyhi =4/3 for any state, a contradiction. 

1.7.3 ^ 9-observable proof by Peres and Mermin 

Last but not least, we'd like to present the beautiful proof of the Kochen- 
Specker theorem done in 1990 by Asher Peres and David Mermin [44, 45], 
the Peres-Mermin square. It uses 9 four-dimensional observables, so in some 
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sense it is larger than the previous two proofs, and also older; but it is also 
quite elegant, and so it might seem smaller to the human mind. 
Let 

(Cr z ®t 1®I7 Z <7 Z ®(7 Z \ 
1 (g> cr x cr x <g> 1 cr x ® cr x \ (1.10) 
a z <g> cr x a x ® <j z cr y ®o- y ) 

be the Peres-Mermin square, where <j x , cr y , and cr z are Pauli matrices. Note that 
observables Ajj that lie in the same line or column always commute, so they 
are simultaneously measurable, and we should be justified in assigning them a 
predefined value Ji(Ajj) = £4 (A) £ { — 1, +1}. But also note that the product 
of the observables in each line or column is always plus or minus identity, 
relation that our predefined values should also respect. More specifically, this 
reasoning leads us to the relations 

ji{a z ® t)y,(l ®a z )n(a z ®a z ) = +1 
<g) <7 x )fi(t7 x <g) l)n(cr x <gi <j x ) = +1 
]i{a z (g) o- x )]i{<j x (g> a z )ji{a y ® a y ) = +1 

ji(cr z ®t)}i(t® a x )}i{a z ® a x ) = +1 

}i(l®a z )}i(a x ®l)}i((T x ®a z ) = +1 
\i{a z ® a z )\i{a x ® <J x )}i(a y <g> cr y ) = -1 

Note now that each predefined value appears twice in the Ihs, so the product 
over all of them must be +1. But the product over the rhs is —1, a contradiction. 



1.8 Ontological excess baggage 

What motivated Bell to prove his famous theorem was his observation that the 
ontological theory of de Broglie-Bohm [ ] has a grossly nonlocal character [ ]. 
A natural question for him was, then, whether this nonlocality was particular 
of Bohm's mechanics or actually a general character of any ontological theory 

[24]- 

In that same paper, however, Bell also noticed that to study a spin system 
within Bohm's theory he had to include the position degree of freedom, and 
reduce spin measurements to position measurements. But by doing so he 
enlarged the number of real parameters required to describe a single qubit 
from two to countable infinity, and worse, the number of ontological states 
had to be uncountable infinity. 

Hardy then asked whether this is a general feature of ontological theories, 
or just a particularity of Bell's model for a spin in Bohm's theory, and found 
that the answer is yes [ ], naming this feature ontological excess baggage. His 
theorem is the subject of this section. 

A perhaps more simple (certainly more direct) illustration of the ontolo- 
gical excess baggage theorem can be found in the naive ontological theory 
described in section 1.2.1, where we identify the ontic space A with the space 
of pure states VH, thus forcing A to have the same cardinality as it, that is, 
uncountable infinity. 
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Theorem 18 (Hardy [10]). In any ontological embedding of quantum theory the 
ontic space A is infinite. 

Proof. Let i/> and (p be two pure quantum states, tp 7^ cp. Then there is a 
measurement ip for which tr(ipip) = 1, whereas tr(ip<p) < 1. Writing these 
measurements ontologically, we have 

J fiygy = 1 and J ji^gy < 1, 

that is, £^(A) = 1 for all A in the support of Uy(\), but there is a Ag in the 
support of f<(p(A) for which £^(Ao) < 1. Consequently, Ao is not in the support 
of }itp{X), and we see that different ontic states must have different supports. 
This constitutes an injection of VH into V{A), i.e., the set of distinct subsets 
of A, thus proving that V(A) is uncountable. This is only possible if A itself 
is infinite (though not necessarily uncountable). □ 

This proof is based on the one presented in 27 [47]. 

One might wonder whether this argument can be extended to show that 
A must be uncountable; after all, in all our examples it is, and we have not 
considered all the information we have: notice that it is never true that the 
support of }iq> contains the support of Uy - they are pairwise incomparable 
- so we have an injection into a subset of V(A), which might have a smaller 
cardinality than it. But this hope is unfounded: there is a set Z of subsets of 
N that has pairwise incomparable members but continuous cardinality. This 
was proved by Martin Goldstern as an answer to a MathOverflow question by 
the author [48]. 

Theorem 19. There is a set Z of subsets of N that has pairwise incomparable 
members but continuous cardinality. 

Proof. For any subset A C N, let X A = {In : n G A}, Y A = {In + 1 : n £ A}, 
and Z A = X A U Y A . Then the set of all Z A is uncountable, since there is an 
injection of V(TN) into it. Also note X A flY A = 0, and therefore Z A C Zg 
implies that X A C Xg and Y A C Yg. This in turn implies that A C B and 
B C A, hence A = B. So the Z A are pairwise incomparable. □ 

But why is Hardy's theorem interesting? After all, if we're not bothered by 
the fact that the set of quantum states T>{H) is uncountable, why should we 
be bothered by the fact that A is infinite? It all has to do with the status of the 
pure states. If they're not ontological, the description of the Bloch ball as a 
vector space of dimension three is perfectly natural. But if we insist in giving 
ontological status to |0) and |+), it becomes a mystery the identification of 

the preparation procedures {(^l ))/ (s'l 1 ))} and or ' 

ontologically speaking, the states ^F\0) + 2 F 1 1 ) and zf|+) + 2^1-)- m f ac *' ^ 
we remember theorem 14, we know that we can't do this identification, as it is 
precisely the assumption of preparation noncontextuality, which we showed 
to be untenable. But if we don't do this identification, the Bloch ball must 



27 Note that Spekkens' claim that A itself is uncountable is incorrect. 
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explode: the set of all ontic states must be the infinite-dimensional set of 
probability distributions over the pure states. 



1.9 How to make an ontological theory? 

In light of all these no-go theorems (and "please-don't-go" theorems), one 
is left to wonder how ugly it would look deterministic ontological models 
that reproduced all of quantum theory (as opposed to the restricted models 
presented in sections 1.2.2, 1.3.1, and appendix A). In fact, they don't look 
so bad on the paper, as their necessary ugliness is more philosophical than 
mathematical. There are, of course, models that are quite intricate, such as 
de Broglie-Bohm's theory. We shall ignore it, however, as we feel that an 
appropriate exposition of it would be too much of a digression. What we shall 
present is the contextual model proposed by Bell in his critique of the Gleason 
theorem [ ], together with a i/>-epistemic modification of it [49]. 

1.9.1 The Bell model 

This i/^-ontic model was proposed by Bell in [4]; we present it here as rendered 
in [49]. 

The ontic space for this model is A = VH x [0,1], the ontic state is 

fty(A^) = S(Ay - tp), 
and the response functions are given by 



&|^( A </" A ) 



'Jfc-l k 
2^ tr Ayipj < A < ^ tr Xycpi 
t=0 1=0 



where [ ] are Iverson brackets 28 , the empty sum tr (pi is 0, and nor- 
malization requires us to set (Ay, 0) = 1. Note that for dhnVH = 2 this 
model reduces to the one discussed is section 1.2.2. 

This model is easily seen to be contextual, since ^^(Ay, A) depends non- 
trivially on the whole PVM (p. 



1.9.2 The Lewis-Jennings-Barrett-Rudolph model 

This model [49] was proposed as a complement to the pbr theorem (theorem 
7), showing that it is in fact possible to make a contextual i/>-epistemic model 
that reproduces quantum mechanics. With it, we complete the discussion of 
)/>-ontic and ip-epistemic models that began in section 1.3. 

As this model is a bit complicated, we shall study first its version for 
dimension 2, in order to clarify the ideas, and then proceed to the general 
case. The response functions used are the same ones as the previous model, 
whereas the ontic states will be modified in order to become i/^-epistemic. 

Let z correspond to the north pole of the Bloch sphere, and u ■ z = cos 6 U 
define the polar angle 6 U of a unit vector u. Then we can define the northern 



Defined as [P] = 1 if the proposition P is true and [P] = otherwise. 
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hemisphere A" as the set of vectors with 8 U < n/2, and label the measurement 
{(pO, (p\ } in such a way that 8,p < 6^ . 

The model is based on the observation that if ip G A/, then the probability 
tr ipcpQ will be strictly larger than for any measurement (po- This observation 
has two consequences. The first is that we can define a lower bound f(ip) for 
tr ip(pQ that does not depend on cpQ, as 

tr!/^ = i(l + cos% ) > ^(l + CO8(0f + 7T/2)) = /(?). 

The second is that there exists a set of ontic states 

A^= {(A^A) : G M and < A < f(if>)} 

such that for any state (A^, ip) in it we have that ^ (A,*, A) = 1. Using all this, 
we can define a ontic state jiy for ip G A/": 

^(A^,A) = *(A* - ^)0(A-/(V)) +f(mA H , 

where Uav is the uniform distribution on A^. Notice that all these states 
overlap in the set Aj^f. The quantum statistics are recovered by 

p{0\tp,<p) = J m£o\4> 

= / @(A — f (ip))©(tr ip(po — A) +f(ip) / U Av 0(trA^ o -A) 

J A J A 

= tvipcp -f(ip)+f(ip) J^U Am 
= trip(p . 

For the case ip A/", we let ^^(A^,A) = <S(A^ — i/>), as usual. 

To make the generalization to dimension d, label the measurement {<£>, } 
in such a way that tr n<^o > tr Tlcpi > . . . > tr n<^_i, where IT is an arbitrary 
state. Now we want to define the analogue of A/, i.e., a set A/' such that 
for any ip in it we have tr ipcpQ > 0. To do that, first note that tr Tlcpo > 1 Id, 
since tr n<^,- are the elements of a probability vector. Now note that tr ip(p§ = 
implies that ip < 1 — <po, so trllip < trlT(l — 0o) < 1 — 1/ d, and therefore 
trFIi/> > (d — l)/d implies that trxp(pQ > 0. With that in hand, we can now 
proceed to finding the analogue of f(ip), is., a lower bound on tripcpQ that 
does not depend on cpQ. Since its existence is clear, we shall not bother looking 
for an explicit expression and just call it f {ip). The analogue of A^v is then 

A V = {(VA) : 6^' and 0<A</'(A^)}, 

and the ontic state, for ip G A/"', is 

Hip = ~ +/'(t/0lT AV . 

For i/> ^ A/"', we let ^(Awi,A) = <S(A^ — as usual. This makes the model 
not "maximally i/?-epistemic", that is, it is not true that for every pair of non- 
orthogonal states ip and <p the ontic states fly and \i<h have a non-zero overlap. 
This raises the question: is a "maximally ip-epistemic" model possible? This 
question was raised by the authors of [49] themselves, and answered by 
George Lowther and Scott Aaronson in the affirmative [ ]. 



Chapter 2 

Revealing surrealism 



Make it simple, because I can only 
understand simple things. 

Asher Peres 

Reading the previous chapter must have felt like walking in sand, with 
the definitions and assumptions being challenged and changed all the time. 
This is unfortunate, but necessary for such a discussion of the foundations of 
quantum mechanics. In this chapter, however, we shall use what we learned 
and develop a final definition of contextuality, which will serve as a solid 
foundation for the work ahead. 

Instead of trying to find an ontological embedding of quantum theory, 
we shall just accept that it can't be done, and try to characterize exactly 
which parts of quantum mechanics can't be embedded in an (noncontextual) 
ontological theory. We shall do this by examining the probability distributions 
over certain events 1 : if such a probability distribution can't be reproduced by 
a noncontextual ontological theory, we shall deem it truly quantum. What 
for, you ask? These probability distributions will be a resource to do what is 
impossible in classical theories: quantum computation with an exponential 
speedup and quantum distribution of cryptographic keys, among other things. 
In other words, quantum magic. 

2.1 The correct definition of contextuality 

The first thing we need to do is to obtain our final definition of contextuality. 
As we discussed in section 1.4, we need a definition that is not specifically 
about quantum mechanics, but instead about probability distributions, as is 
the case of the definition of locality. This need was recognized by Robert 
Spekkens in 2005 [11], but he stopped short of doing that: Spekkens arrived at 
a definition that talked about ontological models instead of quantum theory. 
His definition (at least, the part of it about measurements) can easily be turned 
into a definition that only talks about probability distributions, it shall be our 

1 "Which events?", you ask. That is the question; for a partial answer, read the rest of the 
chapter. 
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definition 24. However, we shall argue that this definition misses the essential 
point about contextuality. 

This necessity was also recognized by Adam Brandenburger and Noson 
Yanofsky in 2008 [ ], but this work limited itself to translating the various 
notions of contextuality that exist in the literature into statements about 
probability distributions. It did not try to judge them and obtain a final 
definition of contextuality. 

Prompted by the discovery 2 of the Klyachko inequality 3 [52, 53], this 
important job was finally done by Adan Cabello, Simone Severini, and An- 
dreas Winter in 2010 [54], where they unified contextuality with the notion of 
nonlocality and provided algorithms to calculate all its relevant properties 4 . 
However, the authors have not bothered to motivate their definitions nor even 
to state them explicitly. 

Such a foundational work was done by Samson Abramsky and Adam 
Brandenburger in 2011 [55], where they have arrived at a definition of con- 
textuality based on probability theory that allowed it to be unified with the 
notion of nonlocality. s 

Now, we shall present this definition and argue that it must be the "correct" 
one. Of course, this statement implies that the definitions discussed in section 
1.4 were wrong. In fact, it is quite a surprise that the correct definition took 
44 years to appear, since the notion was first discussed in [ ] . One could also 
argue 6 that it should be considered 50 years [35, 36], or even 148 years [59]. 

This language is purposefully provocative and should be considered some- 
what tongue-in-cheek, as it does not make sense, strictly speaking, to talk 
about correct or incorrect definitions. We do believe, however, that the new 
definition is a significant improvement over the old ones, as it is already 
proving itself more fruitful. 

To begin, let's start with our muse, the definition of locality: 

Definition 20 (Locality). A set of probability distributions p(flj, bf\A{, Bf), where 
A and B refer to independent systems, is local if there exist response functions 
^ai\AiW'^bj\BjW an d 1 probability distribution such that 

p( ai ,bj\Ai,Bj) = ^dA F (A)£ fl .| A .(A)6,.| B .(A) (2.1) 

This definition was motivated by the belief that "correlations cry out for 
explanation" [60] or, to put it differently 7 , "for those who know A there are 
no correlations", which could be interpreted as 8 

S^lA.vByM = GatlAiWSbjlB/W ( 2 - 2 ) 
2 Or rather its publication in Physical Review Letters. 

3 Note that these papers claim to exclude any ontological models, including contextual ones. 
This claim is incorrect. 

4 We shall discuss this work in section 2.6. 

5 Unfortunately, the authors have chosen to write this paper in the language of category 
theory making it inaccessible to most physicists. A clearer explanation of some of their concepts 
can be found in [56-58]. 

6 But we're not going to. 

7 As Marco Tulio does [61]. 

8 Of course, we demand that p(x\X) = J A d\fi(\)^ x ^ x (\) for every x,X, if anything is to 
make sense. 
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Note that equation (2.2) can in fact be proved 9 (and, consequently, (2.1)) if we 
assume that £ fl . j,.^, g.(A) is deterministic and non-signalling: 

Definition 21 (No-signalling). We say that a set of probability distributions is 
non-signalling if for every A, the marginal 

p(ai\Aj,Bj) = J2p(ai,bj\Ai,Bj) 

h i 

does not depend on By where A and B refer to independent systems. 

Lemma 22. Every deterministic probability distribution p(a.i,bj\Ai,Bj) is factor- 
izable, i.e., there exist probability distributions p{a\\A[, Bf) and p(bj\Aj,Bj) such 
that 

p{ai,bj\Ai,Bj) = p(a j \A i/ Bj)p(bj\A j/ Bj). 

Proof. Define the marginals p{a t \ A{,Bj) = Y^y. p{ a i'b'j\Ai,Bj) and p(bj\Ai,Bj) = 
E a ,p(a^bj\A if Bj). Then 

p(ai\Ai,Bj)p(bj\Ai,Bj) = piiubjlAuB^pia'^bjlAuBj) 
= p{ai,bj\Ai,Bj), 

since p(fl;, bf\A\, Bf) is nonzero for a single pair a,-, bp □ 

Theorem 23. If a set of probability distributions is deterministic and non-signalling, 
then it is local. 

Proof. Define p(a ir bj\Ai,Bj) = ^ ai ,i-\Ai,B-W- Applying lemma 22 and defini- 
tion 21, we have equation (2.2), which implies locality. □ 

Therefore, if one believes in determinism and (relativity-enforced) non- 
signalling, there's quite a good justification for the factorizability condition 
expressed in equation (2.2), and therefore for Bell's definition of locality. But 
we see that determinism is just a possible justification for it, and not at all a 
necessary assumption for talking about locality. Without determinism, some 
valid justifications for factorizability are 

1. Classical theories are factorizable, as can seen by the Gelfand-Naimark 
theorem [ ] . After all, the motivation for looking for a ontological theory 
in the first place was to recover our classical intuition in a quantum 
setting. 

2. We don't demand that A gives us deterministic answers; but without 
factorizability then A does not even explain correlations. And if A does 
not even explain correlations, why bother with it? 

9 This proof seems to be part of the folklore. 
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3. A set of probability distributions admits a joint probability distribution 
if and only if they are factorizable, as proven by the Fine theorem [63] 
(see our theorem 31): 

In fact, in our opinion the best possible justification for the assumption 
of factorizability is the Fine theorem, as the existence of a global probability 
distribution is very appealing on physical grounds. It also shows that the 
assumption of factorizability implies determinism, so there is in fact nothing 
else to justify. 

The Fine theorem shall be our final aim when adapting this discussion to 
contextuality. We start, however, from humbler considerations. First notice 
that definition of no-signalling (definition 21) does not require any idle talk 
about relativity, if we do not require that A, and Bj belong to separate parties, 
just that they can be measured simultaneously (which is the only prerequisite 
for talking about their joint distribution). If we rewrite it like this, we end up 
with a version of Bell's definition of contextuality for probability distributions: 

Definition 24 (Wrong). We say that a set of probability distributions is noncontex- 
tual if for every A; the marginal 

p(ii\Ai,Aj) = Y^piai,aj\Ai,Aj) 

a i 

does not depend on A;. 

It is also fair to consider this definition to be a version of Spekkens' 
definition of measurement contextuality for probability distributions. But 
we know that this definition is not enough for locality: if we do not also 
assume factorizability - or determinism - all hell breaks loose: it becomes 
trivial to construct models that violate locality. In fact, notice that the trivial 
ontological model discussed in section 1.2.1 - which is neither factorizable 
nor deterministic - violates locality; and that by this limited definition of 
contextuality it would be considered noncontextual, a truly unacceptable 
proposition. That is why we call these definitions wrong: they are just a 
generalization of no-signalling. Certainly desirable and useful, but not the 
whole story. 

Following [64], we shall call this generalized no-sigalling property no- 
disturbance: 

Definition 25 (No-disturbance). We say that a set of probability distributions 
respects no-disturbance if for every Aj the marginal 

p(a i \A i ,A j ) = J £ J p(a i ,a j \A i ,A j ) 

a i 

does not depend on Aj. 

The full definition of noncontextuality follows from joining no-disturbance 
with factorizability, mirroring the definition of locality: 
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Definition 26 (Contextuality). A set of probability distributions p{ai,af\A{, Aj) 
is noncontextual if there is a probability distribution p-(A-) and response functions 
SajlAjM suc h that 



Note that this definition is not quite revolutionary, as most works on 
contextuality only considered deterministic noncontextuality Its great value 
comes from the clarity it provides, particularly on the issue of non-determinis- 
tic models: it becomes immediately obvious how to allow for nondeterminism 
without trivializing our requirements, and shows that the discussion on 
whether the response functions associated to effects must be deterministic 
is completely irrelevant. In fact, with it we can ask whether POVMs can be 
useful to observe contextuality, a question hitherto unexplored. 

Furthermore, it should be clear that this definition is exactly the same as 
the definition of locality, modulo the restriction that A{ and Aj are observables 
on separate subsystems; so locality is just a (interesting) particular case of 
noncontextuality 10 . We shall therefore only talk about contextuality and non- 
contextuality, restricting our attention to locality if interesting. Notice also that 
although we only talk about pairs of jointly measurable observables, this defin- 
ition is naturally extended for sets of any (finite) size, with a corresponding 
extension to multipartite locality. 

To complete the discussion of contextuality, the only thing lacking is a 
Fine theorem for noncontextual distributions. By now it should be obvious 
that it must exist, but we prefer to stop here and establish some notation and 
formalize what we already have, in order to be able to give a more precise 
statement. The theorem shall be proved in the next section. 

2.2 The marginal problem 

This notation and definitions are from [55, 57, 58], and are just a formalization 
of the discussion of the previous section. 

Let X = {Xo, . . . , X; c _i} be a set of random variables. 

Definition 27 (Marginal scenario). A marginal scenario C is a collection C = 
{Co, . . . , C n _i} of subsets Ci C X such that C' C Q implies C' 6 C. 

The motivation behind this definition is to define which subsets of X can 
be measured simultaneously, in order to actually measure them and generate 
the probability distributions that will be tested for compatibility. We call the 
subsets Q contexts, and C is the set of all measurable contexts. Note that 
in quantum mechanics C will be precisely the subsets of X that commute 
pairwise. 

An interesting particular case is that of Bell scenarios: 

IO Note that even when one is only interested in tests of noncontextuality, this particular case 
is quite useful, since spatial separation is a good experimental technique to ensure compatibility 
of the measured observables. 
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Definition 28 (Bell scenario). We say that a marginal scenario C is a (bipartite) Bell 
scenario when there is a partition of X into two sets A — {A,} and B = {B ; } such 
that each context Q £ C contains at most one observable from A and one observable 
from B. The multipartite case can be defined in the same fashion. 

Note that each context will be of the form C, = {A/ c , B/} (plus the single- 
tons C, = {Afc} or Cj = {B/}), so we can always implement this scenario in 
quantum mechanics via a tensor product structure, i.e., by defining observables 
A, — A{ ® 1 and Bj = 1 (g) Bj. It then becomes possible to consider A and 
B as independent, spatially separated quantum systems, and to make the 
measurement of A, and Bj with a space-like separation. In this way, each 
choice of context can be justified by an assumption of causality A natural 
example of a Bell scenario is the CHSH scenario 11 , where 

Cchsh = {{A,}, {A 1 } / {B }, {Bi}, {A , B }, {A , B 1 }, {Ay B }, {A v B a }}. 

This definition is only interesting because there are marginal scenarios where 
one cannot justify the choice of context by arguing that they are measurements 
on independent subsystems. This scenario is useful for proofs of contextuality, 
not nonlocality. An interesting example of it is the Klyachko scenario 12 , where 

Ck = {{A },{A 1 },{A 2 } r {A 3 } r {A i } r 

{A , A x }, {A 1/ A 2 } / {A 2 , A 3 }, {A 3 , A 4 }, { A 4 , A }}. 

There is still a third interesting case, a partial Bell scenario, where it is still 
natural to define two subsystems, but we can't justify all the contexts by an 
assumption of causality, only some. A trivial example of such a scenario 
would be joining Ck with an observable Bo that can be in every context of Ck- 
A more interesting example would joining Ck with a copy of itself C' K , where 
we assume that every observable in the first scenario can be in a context with 
every observable in the second scenario. In this case, we can have violations of 
both noncontextuality and locality, with some violations of noncontextuality 
not implying a violation of locality. But we are getting ahead of ourselves; to 
properly define what we mean by a violation we need a method of assigning 
probabilities to marginal scenarios and a definition of noncontextuality and 
locality within this formalism. 

Definition 29 (Marginal model 13 ). A marginal model Cp of a marginal scenario C 
is an assignment of probability distributions Ci 1— >■ p(c,|Q) such that 14 

QCCy^ £ p( Cj \Cj) = p(c t \C t ) 

Cj\q 

That is, for every context Q we assign a probability distribution p(c,|Q), 
where c, is a vector of possible answers to the random variables contained 

"Which shall be discussed in section 2.5.2. 
I2 Which shall be discussed in section 2.5.3. 

13 Alternative names for marginal models are behaviour [65] and box [66]. 
14 With a slight abuse of notation. 
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within Q. Note that this rather minimal compatibility condition on the 
marginals of the probability distributions is just the no-disturbance condition 
(definition 25). We chose to demand it because marginal models that violate 
no-disturbance are trivially contextual, and we want to restrict our attention 
to the interesting cases. 

The reason for this definition is that we can assign these probability distri- 
butions to the context in an empirical manner - for example, from quantum 
mechanical measurements - opening up the possibility of a experimental test 
of locality and noncontextuality. 

With the definition of a marginal model, it becomes possible to state the 
definition of contextuality within this formalism: 

Definition 30 (Contextuality). A marginal model is noncontextual if there are 
response functions ^ Xi \x { W an d a probability distribution p(X) such that for every 
C ' C 

p(c«|Q)= / dAp(A) n £c„|x„(A) 

Naturally, we say that a marginal model is contextual if it is not noncon- 
textual. Note that the definition of locality is the same, with the restriction 
that the marginal scenario is actually a Bell scenario; analogously, we say that 
a marginal model is nonlocal if it is not local. 

Having definition 30, we can state and prove the generalized Fine theorem 
that motivates it 15 : 

Theorem 31 (Fine [55, 63, 67]). A marginal model C is noncontextual iff there 
exists a probability distribution p(x\X) such that for every Q 6 C 



p(ci\Ci) = £p(x\X) 

x\Cj 



Proof. 



By noncontextuality, there are response functions £ X[ .|x ; (A) an d a probabil- 
ity distribution f((A) such that for every C, 6 C 



pidlQ) = / dA/i(A) n ^„|x„(A). 

JA x„€Cj 

Define 

p(x\X)= f dA F (A) n £*„|x„(A) 

JA x n Ex 



I5 It was first considered by Liang et ah [67] and proved by Abramsky et al. [55]. 

l6 In fact, the motivation is so strong that some prefer to consider definition 30 as defining 
"objective reality" instead of noncontextuality [ ]. Although we agree that this interpretation is 
not inappropriate, we prefer to avoid such dramatic terms. 
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Then any marginal p(q |Q) is given by 

p( c ilQ) = E viA x ) 

x\Cj 

dA F (A)E ]1 ^„|x„(A) 

dA F (A) n Zx n \x n W 
x„ e q 

Every probability distribution p(x\ X) can be written as a convex combina- 
tion of deterministic points, so let 



p(x\X) = ( dAp(A)k|*(A). 

J A 



Since deterministic probability distributions are factorizable (lemma 22), 
we can write 

P (X\X)= I dA F (A) ]^[ Sxn\X n W- 

JA x„ex 
By assumption, p(q|Q) = E X \ C; p( x \X), so 



(C ; |C,)= / dA ? /(A)E E[ $Xn\XnW 
JA x\cjX„€X 

= / dA ? /(A) n ^„|x„(A). 

7A x„6c; 



□ 



Note that in the proof of the Fine theorem we can choose the response 
functions Cx n \X„ (A) to be always deterministic, so 

Corollary 32. A marginal model is noncontextual if and only if there are determin- 
istic response functions £ X; |x,(A) and a probability distribution ^(A) such that for 
every Q 6 C 

p(c t \C t )= [ dA^(A) n ^|x„(A) 

JA X„£Cj 

This corollary can be viewed as an alternative (equivalent) definition of 
noncontextuality. 

Now, we can finally state the problem of separating between classical and 
quantum: 

Problem 1 (Marginal problem). How to decide whether a given marginal model is 
noncontextual or contextual? 

This formulation of the problem makes its mathematical treatment much 
easier, since there is extensive literature (and software) on solving the marginal 
problem. But perhaps its greatest contribution is ending the debate on whether 
contextuality can or not be observed in a laboratory: one measures a marginal 
model, and then it is just a mathematical question whether it is contextual 
or not. The "finite-precision" [32, 68] loophole is just not relevant in this 
formulation, as the set of contextual marginal models has non-empty interior. 
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2.3 A first example 

If we only have two random variables, there's nothing interesting to be done, 
since either we already have the global distribution, or we can generate it 
simply by defining 17 p(xo,X\\Xo,Xi) = p(xo|Xo)p(x 1 |X 1 ), so the simplest 
nontrivial scenario must contain at least three random variables. In fact, there 
is a nice little example of it, taken from [ ], which took it from Specker's 
parable of the over-protective seer, that can be found in [35, 36]. In it, we have 
three binary random variables Xo, X\, and X2 that are measured pairwise, 
and found to be always anti-correlated. Formalizing it, the marginal scenario 
is 

OS={{X },{X 1 },{X 2 },{X ,X l },{X 1 ,X 2 },{X 2 ,X }}, 
and its marginal model OSp is (with a slight abuse of notation) 

OSp = (p(x \X ),p(x 1 \X 1 ) r p(x2\X 2 ), 

p {x , xi I X , X 1 ), p (xi, x 2 \X lr X 2 ),p(x 2/ x \ X 2 , X ) ), (2.3) 

which for convenience we arrange in the following tables: 



Xq, Xi Xi, X 2 X 2 , Xo 



X Xi X 2 

>(+,-) 

'( — ) 2 2 2 P( — '+) 2 2 2 



1 1 1 v(+ _\ 1 1 1 

2 2 2 f^' > 2 2 2 



p(--) 000 

To see that this marginal model is contextual, we shall use the Fine theorem 
(theorem 31), as in [67], by showing that there can be no global probability 
distribution p(x\X) with these marginals. 

Theorem 33. The marginal model OSp is contextual. 

Proof. p(+, +|X , Xi) = implies that both p(+, +, +|X , X lr X 2 ) and 
p(+, +, — \Xq, Xi, X 2 ) must be zero. Proceeding in this way with the other 
marginals, we can show that all p(xo,Xi,x 2 |Xo,Xj,X 2 ) are zero, an absurd. 
So there is no global probability distribution and by theorem 31 OSp is 
contextual. □ 

An interesting question is then whether this contextual marginal model 
can be used as a proof of contextuality for quantum mechanics. Unfortunately 
this is not the case, as it requires all three products of observables XjXj to be 
measurable; in quantum mechanics this means that they must commute, and 
therefore the observable XqXjX 2 must be measurable, giving rise to the joint 



17 A moment's thought will convince you that if the marginal scenario contains only the 
singletons X,„ we can always do this and prove that it is noncontextual. 
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probability distribution that must not exist. A marginal scenario with three 
random variables that averts this problem is 

V= {{X } / {X 1 } / {X 2 } / {X 0/ X 1 } / {X 1/ X 2 }} / 

since it is perfectly possible that Xi commutes with both Xo and X 2 , but Xo 
and X 2 does not commute. But this marginal scenario is even more trivial 
than the previous one, since there is always a noncontextual marginal model 
for it 18 . As these two are the only nontrivial marginal scenario with three 
random variables, we must have at least four random variables if we want a 
contextual marginal model realizable within quantum mechanics, and in fact 
there exists one. To be able to explore it, though, we need a bit more structure, 
since a direct proof of contextuality a la theorem 33 can be done only for the 
simplest cases. In the next section, we shall develop a general algorithm to 
decide whether a given marginal model is contextual or not. 

2.4 Boole inequalities 



To be able to solve problem 1, we shall first take a step back and examine its 
geometry. We shall see that the sets of marginal models are convex polytopes, 
and these can be described by a finite set of linear inequalities, and so the 
question of whether a given marginal model is contextual or not is reduced to 
checking if it satisfies all the inequalities for its marginal scenario. This can be 
done efficiently, but with two caveats: obtaining the inequalities for a given 
scenario is a difficult problem (albeit one that can be done by software), and 
the number of inequalities for a marginal scenario may increase exponentially 
with the number of contexts 19 . 

In this section we shall need a number of basic results in convex geometry, 
which we shall make no attempt to prove. Instead, we refer the interested 
reader to the excellent book "Lectures on Polytopes" [69]. 



2.4.1 Sets of marginal models 



When satisfied they indicate that the 
data may have, when not satisfied they 
indicate that the data cannot have, 
resulted from actual observation 

George Boole [ ] 



There are for now two sets of marginal models that interests us: the set of all 
marginal models, and the set of noncontextual marginal models. We shall see 
that both are convex polytopes. 

Definition 34 (Convex polytope). A convex polytope is a bounded intersection of 
closed halfspaces. 

l8 Since we can just define p(xo, xi,xz\Xo,Xi,X2) = p(xo, xi\Xo,Xi)p(x i ,X2\X-i,X2)/p(x- l \X- i ). 
19 As in the example of section 2.5. 
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Theorem 35. The set of all marginal models for a given marginal scenario is a convex 



Proof. Consider the marginal scenario 

C = {C\, . . . , Cn}, 

and a marginal model 

Cp = (p(ci|Ci),...,p(c N |Qv)). 

The fact that each p(q|Q) is a probability distribution is encoded by the linear 
inequalities 20 p(c,|Q) > and £ c . p(c;|Q) = 1, and the fact that this set of 
probability distributions is a marginal model is encoded by the no-disturbance 
condition expressed in the definition 25, which is just another set of linear 
inequalities. It remains to show that the set is bounded, but this follows from 
the fact that each element of Cp belongs to [0, 1]. □ 

We shall call the set of all marginal models the no-disturbance polytope. 
To see that the set of noncontextual marginal models is also a convex 
polytope, it is easier to use another equivalent 21 definition of convex polytopes: 

Definition 36 (Convex polytope). A convex polytope is the convex hull of a finite 
set of points in some W. 

Theorem 37. The set of all noncontextual marginal models for a given marginal is a 
convex polytope. 

Proof. Consider the marginal scenario 

C = {Ci,...,Qv}, 

and a marginal model 

Cp = (p(ci|C 1 ),...,p(c N |C Af )). 

By the corollary 32 of the Fine theorem 31, there is a probability distribution 
p(A) and deterministic response functions £ Xj! |x„(A) such that 



p(c / |C,)= / dAp(A) n ^,,|x„(A), 



and so 



Cp= I dAp(A)( n £* H |x„(A) EI £*„|x n (A)Y 

that is, Cp is a convex combination of the points 

( n ^| X „(A) n ^ix„(a)). 



20 Remember that the equality x = k is just the combination of the inequalities x < k and 

x > k. 

21 The proof of their equivalence is the famous Minkowski-Weyl theorem. 
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Since the response functions are deterministic and we are dealing with a finite 
number of dichotomic random variables, the number of different points is 
finite, and so a marginal model is the convex combination of a finite number 
of points. □ 

Analogously, the set of all noncontextual marginal models shall be called 
the noncontextual polytope. 

As a consequence of this proof, we see that the vertices of the noncon- 
textual polytope are simply the deterministic probability distributions for 
the outcomes of each context, and as such they are trivial to find. What we 
want to do, then, is from this list of vertices obtain the linear inequalities that 
describe the noncontextual polytopes. This is a classical problem in convex 
geometry, and there are plenty of algorithms and software for solving it. Here 
we shall use the reverse search algorithm, due to Avis and Fukuda [70], as 
implemented in the software Irs [71]. Following Itamar Pitowsky, we call 
these Boole inequalities. 

Before exploring them, we need a refinement in our representation of 
marginal models. 



2.4.2 Representing marginal models 

When writing down a marginal model, such as (2.3), one immediately notices 
that it has a lot of redundancies. First of all, the joint probability distributions 
of a context completely determines its marginals, since a marginal model 
respects no-disturbance by definition. Furthermore, for each context there 
is one parameter that is already determined by normalization, and finally 
each random variable is usually shared by two or more contexts, so the joint 
probability distributions of different contexts are not independent, as they 
might share some marginals. 

All these reasons motivates us to find another representation of a marginal 
model, that already incorporates normalization and no-disturbance. When 
using only dichotomic random variables (as we shall do in this thesis), the 
best representation is via the expectation value of each context, as they contain 
all the information of a marginal model with no redundancies. 

Theorem 38. For dichotomic random variables, a marginal model can be represented 
by the expectation values of all contexts with no redundancies. 

Proof. To check that, it is enough to see that all the information present on the 
marginal model is preserved when it is translated into expected values, i.e., 
there is a (linear) invertible transformation between a marginal model and 
a vector of all the allowed expected values. Consider, for instance, the joint 
probability distribution for the context {Xq, Xj}. The transformation is 



/l 1 1 


!\ 


(Pi- 


h, + |Xo,XO^ 




( 1 \ 


1 -1 1 


-1 


P(- 


K — PQ)/Xi) 




(Xi) 


11-1 


-1 


P(- 


-/ + Xo,Xi) 




<Xo) 


\1 -1 -1 


1 / 


\P(- 


-/— X^Xi)/ 







(2.4) 
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and inversibility comes from the fact that the matrix is proportional to its 
inverse. 

The proof for contexts with more than two random variables comes from 
noticing that the matrix which does the linear transformation is a Hadamard 
matrix 22 . Specifically, the transformation for n random variables can be 
recursively defined as follows: Let 



Hi = 



1 1 
1 -1 



and define H n = H\ <8> H„_i. Then it is easy to check that H n is always 
self-adjoint and H„ = 2"1. Furthermore, if the vector of probabilities is 
ordered in the obvious binary way, the vector of expected values will have 
a corresponding order, i.e., its kth element will be ^X^X" 1 . . . X^"^ 1 \, where 
uqUi . . . is the binary expansion of k. □ 

As this representation already assumes normalization and no-disturbance, 
the only information that it lacks is positivity. Since positivity does not reduce 
the number of dimensions, it is not possible to find a representation that 
already assumes it. Instead, one enforces it via the inequalities 

4p(+, + |X ,X 1 ) = 1 + <X ) + (X x ) + (XoXj) > (2.5a) 

4p(-,+|X ,X 1 ) = 1 - (X ) + (Xi> - (XoXi) > (2.5b) 

4p(+,-|X ,X 1 ) = l + (Xo>-(X 1 )-(X X 1 ) >0 (2.5c) 

4p(-,-|Xo,X 1 ) = l-(Xo}-(X 1 } + (X X 1 ) >0 ( 2 . 5 d) 

which are obtained by inverting transformation (2.4). 

Using this representation also gives us some notational convenience: since 
we have one expected value for each context, we can define a marginal model 
simply by assigning one expected value for each context in a marginal scenario. 
For example, the marginal model for the marginal scenario 

OS= {{X },{X 1 },{X 2 },{X ,X 1 },{X 1 ,X 2 },{X 2 ,X }}, 

originally written as (2.3), shall be 

OSp = ((Xo),(X 1 ),(X2),(XoX a ),(X : X 2 ),(X 2 Xo)), (2.6) 

which is easily calculated as 

OS V = (0,0,0,-1,-1,-1). (2.7) 

Another advantage of this representation is that we can easily see which 
statistics that indicate correlations between random variables, such as (X;Xy), 
and which only talk about individual systems, such as (X,). We shall see that 
it is quite common to study inequalities that only take into account correlations 
between random variables 23 : these are called full-correlation inequalities. When 
talking about contexts with more than two random variables, this name is 
applied only to inequalities that take into account the largest possible contexts. 

"Thanks to Daniel Jonathan for pointing this out. 
23 In fact, only these shall be studied in this thesis. 
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2.4.3 The noncontextual polytope for OS 

Now that we have a good representation, we can discuss the first example of 
Boole inequalities. We shall obtain them for the marginal scenario OS. The 
first thing we need are the vertices of the noncontextual polytope, which are 
simply the 2 3 deterministic assignments ±1 to each random variable (X;). 
Written in the ordering given by equation (2.6), they are 

(+,+,+,+,+,+) (-,+,+,-,+,-) 

(+,+,-,+,-,-) (-,+,-,-,-,+) 

(+,-,+,-,-,+) (-,-,+,+,-,-) 

(+,-,-,-,+,-) (-,-,-,+,+,+) 

where for clarity we have omitted the ones. Inputting these vertices into Irs 24 , 
it returns 16 inequalities to us: 12 are the positivity conditions (2.5) for each 
pair of random variables, and 4 are the Boole inequalities 

-<X Xi) - (X^) - (X 2 X ) < 1 (2.8a) 

- (XoXj) + + (X 2 X ) < 1 (2.8b) 

+ <X X 1 ) - (X^) + (X 2 X ) < 1 (2.8c) 

+ <X X 1 ) + (X^) - (X 2 X ) < 1 (2.8d) 

The marginal model OSp, equation (2.7), is then easily seen to violate inequal- 
ity (2.8a), being thereby contextual. 

Exactly these same inequalities were obtained by Pitowsky using Boole's 
method [59, 72]. 

2.5 The n-cycle 

As we have discussed before, it is not possible to violate the Boole inequalities 
for the marginal scenario OS with quantum mechanics. However, there is a 
natural generalization of this scenario which does have a quantum violation. 
Consider the set of random variables X — {Xo, . . .,X n _i}, and the marginal 
model C n formed by considering the singletons X, together with the pairs 
{Xj,Xj + i}, where naturally the addition is taken modulo n. For n = 3, C" 
is the marginal scenario OS discussed before. For general n this scenario is 
called the n-cycle, as its compatibility 25 graph is a n-cycle, as shown in figure 
2.2. 

The n-cycle marginal scenario is an old problem that was studied many 
times. The 2-cycle was characterized by George Boole in 1862 [59, 72], who 
also provided the general algorithm for solving the marginal problem. The 
3-cycle was first studied by Ernst Specker in i960 [35, 36], and characterized 

24 For those that do not like this kind of proof, we shall obtain these same inequalities in the 
next section via a parity argument. 

25 The graph that has random variables as vertices and edges connect random variables that 
are in the same context. 

26 As we discussed before, in this case the noncontextual polytope coincides with the no- 
disturbance polytope, and therefore its facets are only the positivity conditions (2.5). 
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Figure 2.1: Full correlations parts of the noncontextual polytope - green 
tetrahedron - and no-disturbance polytope - black cube - for the marginal 
scenario OS. Note that this is a projection onto the last three components. 




Figure 2.2: Contexts for the 3-cycle, 4-cycle, and 5-cycle. 



by Itamar Pitowsky in 1989 [ ]. The 4-cycle was characterized by Arthur Fine 
in 1982 [63]. The 5-cycle was characterized by Alexander Klyachko in 2002 
[52]. The n-cycle for all odd n was studied by Yeong-Cherng Liang, Robert 
Spekkens, and Howard Wiseman in 2010 [67], and also by Adan Cabello, 
Simone Severini, and Andreas Winter in the same year [ ]. The general n -cycle 
was studied by Rafael Chaves and Tobias Fritz in 2012, who derived entropic 
inequalities which are necessary but not sufficient for noncontextuality for all 
n [57, 58]. An answer to the general question was conjectured by Cabello el al- 
ia. 2012 [43]. It will be given here 27 . 

The Boole inequalities for this scenario can be derived from the simple 
algebraic observation that if oq = ±1 are the components of a n -element vector, 
then the vector f> with n components /5 ; = a,a !+ i always has an even number 
of negative components. Therefore, if we define a third vector 7 with an odd 



27 The results of this and the next section are new [74]. 
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number of negative components, then 



(7,P) <n-2, 



(2.9) 



since to maximize the inner product we should set jS = 7, but this would 
force f> to have an odd number of negative components, which is impossible. 
The best we can do then is to switch one of the — 1 to +1, which gives us the 
desired bound. 

If we now set (X;) = a,-, then /5 ; - = (X;X,- + i) is the full-correlation part 
of the vertices of the noncontextual polytope for this marginal scenario, and 
inequality (2.9) becomes the Boole inequality 



Since these are satisfied by noncontextual vertices, they are also satisfied by 
the convex combinations of them, and so every noncontextual marginal model 
respects these inequalities. We claim that these 2" _1 inequalities are all the 
Boole inequalities for the n-cycle. To prove this, we shall check that these 
inequalities are actually facets of the noncontextual polytope, and that there 
are no more Boole inequalities for the n-cycle. 

Theorem 39. All inequalities (2.10) are facets of the noncontextual polytope for the 
n-cycle. 

Proof. We will check that each Boole inequality (2.10) is saturated by 2n 
affinely independent vertices of the noncontextual polytope, that generate 
an affine subspace of dimension 2n — 1. Note that if we flip the sign of any 
component 7; of the Boole inequality 7, then this new vector 7' satisfies 
{it 7') = n — 2 and has an even number of negative components, so we have 
obtained the full-correlation part of a noncontextual vertex that saturates the 
Boole inequalities. Since there are two ways of completing the local part of a 
noncontextual vertex that are consistent with a given full-correlation part and 
we have n components 7; to flip the sign, in this manner we obtain 2n vertices 
of the noncontextual polytope that saturate the Boole inequality 7. To check 
that they are affinely independent is trivial. □ 

To check that there are no more Boole inequalities, we need first to charac- 
terize the contextual vertices of the no-disturbance polytope. 

Theorem 40. The vertices of the no-disturbance polytope are the 2" noncontextual 
deterministic marginal models 



where (X,) = ±1, together with the 2" 1 contextual marginal models of the form 



n-l 




(2.10) 



«X ), . . . , (X n _!), (X > (Xt), . . . , (X„_i) (X )), 



(2.11) 




(2.12) 



where (X ; X !+1 ) = ±1 such that number of negative components is odd. 
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Proof. By definition, the vertices of the polytope are given by the intersection 
of 2n independent hyperplanes, i.e., as a unique solution for a set of 2n 
independent linear equations chosen among the 4ft equations (2.5). The above 
vertices are obtained by choosing two equations among (2.5a)-(2.5d), for each 
index i. In particular, contextual vertices are obtained by choosing equations 
(2.5a) and (2.3d) for an odd number of indexes i and equations (2.5b) and 
(2.5c) for the remaining indexes. 

It is straightforward to check that all other possible strategies for obtaining 
a vertex, i.e., involving the choice of 1, 2 or 3 equations for each index i, give 
the same set of vertices. □ 

We now show that by eliminating each contextual vertex of the no-distur- 
bance polytope we obtain only one noncontextuality inequality. By eliminating 
all 2" _1 contextual vertices, we obtain 2"" 1 noncontextuality inequalities and 
the convex hull of all noncontextual vertices, i.e., the noncontextual polytope. 

Lemma 41. Let C be a contextual vertex, and consider the inequality (2.9) with 
7 = C. Then the intersection of the half-space (j,P) < n—2 with the no-disturbance 
polytope is the convex hull of all vertices but C. 

Proof. To show that, we shall check that the vertices of the intersection of the 
half-space (j,P) < ft — 2 with the no-disturbance polytope are a subset of 
the vertices of the no-disturbance polytope. For contradiction, suppose that 
the intersection generates a new vertex P' that was not a vertex of the no- 
disturbance polytope. Then (7, P') — n — 2 and, furthermore, P' must lie on an 
edge connected to C, since all the other vertices respect the inequality. Edges 
of the no-disturbance polytope must saturate 2n — 1 independent positivity 
conditions (2.5), and therefore P' must saturate 2n — 1 inequalities which are 
a subset of the 2n inequalities saturated by the vertex C. 

Let j8 be the full-correlation part of C, and 5 the full-correlation part of 
P'. For each i, if /5, = +1, then C saturates (2.5b) and (2.5c). If /3; = — 1, 
C saturates (2.5a) and (2.3d). Therefore, for every i but one, let's say, z'g, P' 
must saturate both positivity conditions; but saturating them both implies 
that Sj = jS,-, leaving only Sj Q free. But if we now demand that (7, -P'} = n — 2, 
then Sj = —f>j Q , and therefore P' is just an old noncontextual vertex. □ 

To summarize our results: the no-disturbance polytope has 2" + 2" _1 
vertices, of which 2" are noncontextual and 2" _1 are contextual. It has 4n 
facets, which are the positivity conditions (2.5). The noncontextual polytope 
has 2" vertices and 4n + 2 n_1 facets. 

2.5.1 Quantum violations 

The Boole inequalities for the n-cycle are violated by quantum mechanics 
for every n > 4. Since the inequalities for a given n are all equivalent via 
relabellings, it is enough to violate one of them. For odd n, we choose the 
inequality with all 7, = —1. The minimal dimension we need to violate 
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the Boole inequalities is 3, the state is always |0), and the observables 28 are 
A k = 2\v k ){v k \-\ where 

\v k ) — (cos 6, sin 6 cos <p k , sin 9 sin ft), 

where 

m — 1 , 

ft = — TTfc 

and 

9 cos 5 

cos 2 



1 + COS f " 

Then (0\A k A k+1 \0) = -4| (0\v k ) \ 2 + 1 = -4 cos 2 + 1, and 

/ cos a \ 
g» = n(4 (2.13) 

V l + cos- y 

The noncontextual bound is £>„ < n — 2. This inequality is saturated for n = 3, 
and violated for all n > 5. To see this, it is enough to use some simple algebra 
and the fact that 



cos — > 1 ^ 

n n 1 

for all n. 

For even n, we choose the inequality for which all 7/ = —1 except for 
7„_l = +1. Dimension 4 is enough to violate 29 it for all n, with the state 

|^-> = |01)-|10), 

and the observables 30 X k = X k <8> 1 for even A: and Xj- = 1 (g) Xj- for odd k, 
where 

A:7T . fc7T 

X k = COS (7 X + sm — c z , 

n n 

and c x , <j z are the Pauli matrices. 
We can then check that 

X k X k+1 \tp-) = -cos-|t/>_) -sin^|4>+) 
for every k except k = n — 1, when 

X n _iX |t/>-) = cos — |f_) - sin — 

Therefore, 

B„ = n cos — , (2.14) 
n 

so the noncontextual bound is saturated for n = 2, and violated for all n > 4. 

Note that in both the even and odd cases lim^-^oo B n = n, the algebraic 
bound. 



28 These states and observables are from [67]. 

2, We conjecture that this is in fact the minimal dimension. For n = 4 the proof is well-known. 
3 °These states and observables are from [75]. 
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2.5.2 The CHSH inequality 

The 4-cycle is actually a Bell scenario, since every observable in the set {Xo, X2} 
commutes with every observable in the set {Xi,X3}. Renaming Ag = Xq, 
A\ — X2, Bq = X\, and Bi = X3, we have the famous CHSH inequality [76]. 

(A B ) + (A B 1 ) + (A l B ) - (AqBi) < 2 

The maximal quantum violation for it - its Tsirelson bound [ ] - is 2\* / 2. This 
inequality was used in countless experimental tests of nonlocality of which 
the most famous are the first, by Freedman and Clauser [20], and Aspect's 

[78]. 

2.5.3 The Klyachko inequality 

The 5-cycle was studied before by Klyachko [ ], and the following inequality 
got his name: 

-(XoXi) - (X!X 2 ) - (X 2 X 3 ) - (X 3 X 4 ) - (X4X0) < 3. 

Its Tsirelson bound is 4\/5 — 5. It is the simplest Boole inequality that is not 
also a Bell inequality that can be violated by quantum mechanics. It was also 
the first such inequality to be discovered 31 . Since this inequality can violated 
by qutrits, and only requires the measurement of 5 observables, it allows one 
of simplest possible tests of noncontextuality. Such an experimental test has 
in fact been carried out [ ]. 

2.6 Boole inequalities as graphs 

The quantum violations presented in the previous section are in fact the largest 
possible. Proving this, though, requires a bit of effort. To do that, we shall use 
the techniques from [54]. 

First of all, notice that to study the quantum violation of a Boole inequality, 
we could have represented them as operators; instead of writing them as 

B n = E7i<XfX m ) <n-2, 

i=0 

we could have defined an operator 

n-1 

&n = E 7/XiX m 
i=0 

such that B n = {B n ) . Then the question of which is the maximal quantum 
violation of a Boole inequality is answered by finding 

Oq = maxtrpo,,, 

pA 

31 Pitowsky found the inequalities for the 3-cycle in 1989 [ ], but they can not be violated by 
quantum mechanics. 
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where the maximization is done over all quantum states and all operators B n 
which respect the commutation relations implied by the marginal scenario. 
The first thing we notice is that since a mixed state is a convex combination of 
pure states, this maximum is always attained by a pure state. So 

Qq = maxtri/?$ n = max||$„||, (2.15) 

since B n is a self-adjoint operator 32 . This is already useful, but not much, 
since it is not easy to find operators which implement the marginal scenario. 
However, for a given B n we can find the maximal violation for it (and the state 
which attains it) simply by diagonalizing its matrix, which may be useful. 

We have already hinted in the beginning of the previous section that a 
marginal scenario can be encoded as a graph 33 ; however, those graphs are not 
detailed enough for our purposes, since they do not specify the structure of 
the observables down to the level of their projectors. For that, we shall need a 
graph that takes into account not only the questions to be asked, but also the 
answers: the CSW graph [54]. 

To define it, we need first to notice that any Boole inequality can be 
rewritten in the CSW form, as 

E = j>telQ)<ONo (2.16) 

that is, as a sum of probabilities with coefficients +1. This can always be 
done, since we can just write the expectation values as probabilities, and 
— p(A = a) = p(A 7^ a) — 1. For example, the term ±(X,Xy) becomes 

±(X i X j )=2(p(+±\X i ,X j )+p(-T\X i ,X } )) -1. (2.17) 

It is easy to see that this representation is not unique. For example, we could 
write p(+ + |Xo,Xi) as 

p(+ + |Xo,Xi) = p(+ " l|Xo,Xi) + p(-|X ), 

but this does not matter, since any representation will be good enough for our 
purposes 34 . For example, the inequality (2.8a) for the 3-cycle is represented as 

p(+ - |01) + p(- + [01) + p(+ - |12) + p(- + |12) 

+ p(+-|20) + p(- + |20) <2, (2.18) 

where for clarity we're omitting the ones and the Xs. In fact, all the inequalities 
(2.8) for the 3-cycle have this same representation, modulo relabellings. 
Now, we're ready to define the CSW graph: 

32 1| • || is the standard operator norm, which can be calculated in polynomial time. 

33 Actually, in the general case it must be an hypergraph: it is only a graph when the maximum 
number of observables in a context is two. 

34 The situation is more delicate when we're talking about nonlocality instead of contextuality; 
then the Lovasz function (which we shall define shortly) of the CSW graph will be only an upper 
bound for the quantum violation [80]. 
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Definition 42. The CSW graph of a Boole inequality in the CSWform is the graph 
that has the events c\ | Q as vertices, with edges connecting exclusive events. 

For example, the vertex H — |01 from inequality (2.18) will be connected to 
the vertices H — 1 12, — \- |01, and H — 1 20. Its CSW graph is the prism graph 
represented in figure 2.3. 



+-I20 




+- I 01 +- 1 12 

Figure 2.3: CSW graph for the 3-cycle. 



We're now almost ready to state the CSW theorem; we only need to define 
what is an orthonormal representation of a graph: 

Definition 43. An orthonormal representation of a graph G with vertices V; is an 
assignment of projectors r, such that Vj adjacent to Vj implies that YjYj = 0. 

Then orthonormal representations of a CSW graph will be just the project- 
ors associated to the events of the vertices, i.e., 

~\~ 1 01 1 y n+rif = r„ 

since projectors associated to exclusive events are orthogonal, and the product 
of commuting projectors is also a projector. Taking the sum over the expecta- 
tion values of all such T, is then just the quantum value of the inequality. The 
maximal such value is then 



Qq = max ^2 t r = m ax 



(2.19) 



where the maximization is done over all orthonormal representations of the 
CSW graph. We seem to have gotten back to equation (2.15) again, but 
that's not true: equation (2.19) is the definition of a famous graph-theoretical 
function, the Lovasz d function 35 [81, 82]! It can be calculated in polynomial 
time via a SDP, and its value is known for some simple families of graphs. 
There's one caveat: the usual definition of the Lovasz function requires the T, 



35 See theorem 5 in [81]. Note, however, that Lovasz's definition of an orthonormal repres- 
entation of a graph G is equivalent to our definition of an orthonormal representation of the 
complement graph G. 
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to be one-dimensional projectors, and in our case this is not always true. But 
there is no gain in generality by allowing many-dimensional projectors, since 
we can always define one-dimensional projectors 



r! 



T t ipT t 
tr ipTi 



such T' are are also an orthonormal representation of G and tr ipT'j = tr ipTj 
for every i, so equation (2.19) is in fact equivalent to the Lovasz # function. 
We can now state the main theorem from [54]: 

Theorem 44 (Cabello-Severini-Winter [54]). Let IZbea Boole inequality represen- 
ted in the CSW form, and G its CSW graph. Then in quantum mechanics 

maxZ = fig = #(G). 



2.6.1 Tsirelson bounds for the n-cycle 

As an application of theorem 44, we shall find the quantum bounds for the 
Boole inequalities found in section 2.5. As these inequalities only have terms 
±(X,Xj), the transformation (2.17) will be enough to bring them to the form 
of inequality (2.16), so 

B„ = 2E - n, 

where £ is the desired sum of probabilities. To find the CSW graph for odd n, 
the same strategy used in figure 2.3 works, so it will be the prism graph Y n , 
and therefore the Tsirelson bound is 2i?(Y„) — n. The Lovasz function of the 
prism graph is 3 

, . 2ncos^ 

*oy = iWr 

thus proving that the quantum violation (2.13) is the largest possible. 

To find the CSW graph for even n, the strategy is as represented in figure 
2.4, where it is done for n = 4. It is clear that this strategy always works, so 
the CSW graph for even n is the Mobius ladder M.2n- Its Lovasz function is 37 



#(M 2 „) = |(l + cos^ 
thus proving that the quantum violation (2.14) is in fact the largest possible. 



2.7 State-independent Boole inequalities 

All the Boole inequalities we have studied so far have quantum violations that 
depend on the quantum state: they are violated by some, but not violated by 
others. This situation stands in contrast with the proofs of contextuality we 
studied in section 1.7: they only considered predictions of quantum mechanics 
that were valid for any state. Therefore, it would be quite surprising if we 
couldn't find a Boole inequality that were violated by any quantum state. 



36 As can be proved from the results of [=14, 67]. 
37 As can be proved from the results of [83]. 
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++I12 



-If 1 30 I++ I 01 



+-I30 




101 



Figure 2.4: CSW graph for the 4-cycle. 



2.7.1 A Boole inequality from the 18-projector proof by Cabello, 
Estebaranz, and Garcla-Alcaine 

The 18-projector proof [ ] translates quite directly into a state-independent 
Boole inequality [41]. To see that, define A» = 2c,y — 1, where Vu are the 
projectors from figure 1.2. Then if we take the product of four commuting 
such Aij, it will be always equal to —1. Taking these products over all nine 
sets of commuting Ay and adding them together, we get 

2^18 = — ^12^16^17^18 ~~ ^12^23^28^29 — ^23^34^37^39 

- A34A45A47A48 - ^45^55^58^59 - A^A^Atf A(,g 

- A 17 A 37 A i7 A 67 - A 18 A 2S A i8 A 58 - A 29 A 39 A 59 A b9 = 91 (2.20) 
but a computer program can easily check that in any noncontextual theory 

2l8 = "(^12^16^17^18} " (^12^23^28^29) " ( A 23 A34A37A39) 

- (A34A45A47A48) - ( A45 A 56 A 58 A 5 9) - (A 16 A 56^67^69) 
- (A 17 A 3 7A47A 6 7) - (A 18 A 28 A i8 A 58 ) - (A 2 9A39A5 9 A 69 ) < 7. (2.21) 

This Boole inequality is therefore violated by any quantum state. 

2.7.2 A Boole inequality from Yu and Oh's 13-projector proof 

The projectors from Yu and Oh's 13-projector proof can also be used to form 
such a state-independent inequality [ ], but their inequality is not a facet 
of the noncontextual polytope, and according to our definition not a Boole 
inequality at all. Fortunately, there is a Boole inequality associated to their 
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projectors, found by Cabello et al. [43]. It reads 
J yo = 2(H ) + £<Z,) + (Y+) + (Yf) +2{Hi) 

i=l 




-3£(Z it Y+Y fc ->- £ (Q)<25, 
)c=l QeC 2 

where Z; = 1 — 2z;, Y^ = 1 — 2y ! ± , H; = 1 — 2ft;, as defined in section 1.7.2, 
and C2 is the subset of two-observable contexts of Yu and Oh's marginal 
scenario. The operator lyo = (25 + 8/3)1 is again proportional to identity, 
and this inequality is the one in Yu and Oh's noncontextual polytope with 
the largest relative Tsirelson. As this inequality was found by a computer 
program we feel no need of reproducing a proof here. 

2.7.3 A Boole inequality from the Peres-Mermin square 

Peres-Mermin's proof can also be adapted into such an inequality. Let A^ 
be the observables of the Peres-Mermin square as defined in equation (1.10). 
Then it follows that 

IpM = A11A12A13 + A21A22A23 + A 3 iA 32 A 33 

+ A n A 21 A 31 + A l2 A 2 2A 3 2 - A 13 A23A33 = 61, 

but a computer program 38 can easily check that 

IPM = (A11A12A13) + (A21A22A23) + (A31A32A33) 

+ (A n A 21 A 31 ) + (A 12 A 2 2A32} - (A 13 A23A3 3 ) < 4. 

This Boole inequality was also found by Adan Cabello [41]. 

Note that in all these inequalities the operator X was proportional to 
identity, but this is not a required condition for a state-independent violation: 
we only need (2/^ to be larger than the noncontextual bound for every ip. 
It is an open question if there is a Boole inequality that satisfies the latter 
condition but not the former 39 . 



38 Or in fact yourself, by some playing around with the triangle inequality. 
39 It is trivial, however, to generate such inequalities that are not facets of the noncontextual 
polytope. 



Conclusion 



The attentive reader might have noticed that despite hints of quantum magic as 
the motivation for this thesis, there has been almost no mention of it in the 
technical parts of the text. In part this is because of the limitations of time 
and space, but more importantly because I believe that to really understand 
quantum magic, we must understand the foundations of quantum mechanics 
first; and this latter understanding is still sorely lacking. The goal of this thesis 
was therefore to help with this point. 

This goal can be naturally split in two parts (if not in two chapters): first, 
to summarize old research in a clear and consistent way, and second (and 
more important), to expose new research that is not as widely known as I 
think it deserves to be. 

Specifically, I hope to have convinced the reader that the formulation of 
noncontextuality exposed in chapter 2 is a fruitful way of separating "classical" 
phenomena from those that are truly quantum. The way ahead is to actually 
pick up those fruits: develop information processing protocols that derive 
their strenght from the violation of Boole inequalities. In a sense, this work 
has already begun: we know that the higher-than-classical power of quantum 
random access codes comes from contextuality [84], and [{17] has a very 
colourful description of a game in which contextuality boosts the chance of 
success. 

But, in my opinion, these protocols lack a deeper appeal, since it's not 
clear if the fact that they have a quantum advantage means anything other 
than the fact that they have a quantum advantage. What would really please 
me is to find a connection between contextuality and a discovery that has 
far-reaching implications in physics, mathematics, and computer science: 
quantum computing. 
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Appendix A 

The Bell-Mermin model 



This ontological model was first proposed by Bell in 1964 [ ], in order to 
provide a counterexample to von Neumann's theorem [ ], and later cleaned 
up by David Mermin [ ]. It is certainly the simplest deterministic ontological 
model out there, having been constructed to describe the statistics coming 
from the measurement of any observable of a pure qubit. It is not contextual, 
but if extended to mixed states it would have to be preparation-contextual, by 
Spekkens' theorem, and if extended to higher dimensions it would become 
measurement-contextual, by Gleason's theorem. It also can't be extended to 
describe POVMs, by Busch's theorem. In a sense, then, it is the best that a 
realist commited to non-contextuality can do. 

This model is quite out of fashion, as it measures observables instead of 
its projectors; but we shall make no violence to it by "fixing" this feature. The 
concerned reader may do it himself quite easily, or simply consult Harrigan's 
work [ ]. 

We formulate it by representing a two-dimensional self-adjoint observable 
A in the Bloch basis, as 

A — «ol + a ■ a, 

where «o S R* a £ 1R 3 and cr is the vector of Pauli matrices. 

The ontic space A = S 2 x S 2 is the cartesian product of two unit spheres. 
In the first one we shall embed the pure states via their Bloch vector i/> £ S 2 , 
defined by ip = i (1 + Tp ■ c), and in the second one we shall use an auxiliar 
unit vector A. 

The ontic state is then 

jiy = 5(\ l p - 

and the response function is 

SUCVA) =fl + ||fl||sign(fl-(A + A^)). (A.i) 

Notice that given A^, and A, it gives deterministically uq + \\a\\ or flg — as 
required. 
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APPENDIX A. THE BELL-MERMIN MODEL 

To recover the quantum statistics, we take the uniform average of £4 



A: 



(A) = / 

= a + \\a\\ J^dAydA6(A,p - tp)sign(a ■ (A + A^)) 

= flO+|| fl || / dA sign(« ■ (A + S)) 

.is 2 

= a + ||a||— / / sin0 flA d0 flA d<p sign(cos# flA +a ■ $) 



47T Jo Jo 
I 1 r-cos^ 1 rn 

= «o + ||«lk / sin0 flA d0 flA -/ sme aA de aA 

2\.Jo Jcos^f-a-ip) 



= uq + a ■ if> 
= tr A\p 
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